pixie-io / pixie Goto Github PK

Instant Kubernetes-Native Application Observability

License: Apache License 2.0

Makefile 0.14% Python 4.57% Shell 1.54% Go 22.19% Starlark 6.14% HCL 0.01% C++ 52.77% C 1.84% PLpgSQL 0.02% JavaScript 0.22% TypeScript 9.83% Java 0.04% Dockerfile 0.15% HTML 0.01% PHP 0.23% Ruby 0.25% Scala 0.01% Thrift 0.01% Lua 0.04%

golang kubernetes ebpf vega monitoring gke eks aks minikube machine-learning

pixie's Introduction

Pixie is an open-source observability tool for Kubernetes applications. Use Pixie to view the high-level state of your cluster (service maps, cluster resources, application traffic) and also drill down into more detailed views (pod state, flame graphs, individual full-body application requests).

Why Pixie?

Three features enable Pixie's magical developer experience:

Auto-telemetry: Pixie uses eBPF to automatically collect telemetry data such as full-body requests, resource and network metrics, application profiles, and more. See the full list of data sources here.
In-Cluster Edge Compute: Pixie collects, stores and queries all telemetry data locally in the cluster. Pixie uses less than 5% of cluster CPU and in most cases less than 2%.
Scriptability: PxL, Pixie’s flexible Pythonic query language, can be used across Pixie’s UI, CLI, and client APIs.

Use Cases

Network Monitoring

Use Pixie to monitor your network, including:

The flow of network traffic within your cluster.
The flow of DNS requests within your cluster.
Individual full-body DNS requests and responses.
A Map of TCP drops and TCP retransmits across your cluster.

For more details, check out the tutorial or watch an overview.

Infrastructure Health

Monitor your infrastructure alongside your network and application layer, including:

Resource usage by Pod, Node, Namespace.
CPU flame graphs per Pod, Node.

For more details, check out the tutorial or watch an overview.

Service Performance

Pixie automatically traces a variety of protocols. Get immediate visibility into the health of your services, including:

The flow of traffic between your services.
Latency per service and endpoint.
Sample of the slowest requests for an individual service.

For more details, check out the tutorial or watch an overview.

Database Query Profiling

Pixie automatically traces several different database protocols. Use Pixie to monitor the performance of your database requests:

Latency, error, and throughput (LET) rate for all pods.
LET rate per normalized query.
Latency per individual full-body query.
Individual full-body requests and responses.

For more details, check out the tutorial or watch an overview.

Request Tracing

Pixie makes debugging this communication between microservices easy by providing immediate and deep (full-body) visibility into requests flowing through your cluster. See:

Full-body requests and responses for supported protocols.
Error rate per Service, Pod.

For more details, check out the tutorial or watch an overview.

Continuous Application Profiling

Use Pixie's continuous profiling feature to identify performance issues within application code.

For more details, check out the tutorial or watch an overview.

Distributed bpftrace Deployment

Use Pixie to deploy a bpftrace program to all of the nodes in your cluster. After deploying the program, Pixie captures the output into a table and makes the data available to be queried and visualized in the Pixie UI. TCP Drops are pictured. For more details, check out the tutorial or watch an overview.

Dynamic Go Logging

Debug Go binaries deployed in production environments without needing to recompile and redeploy. For more details, check out the tutorial or watch an overview.

Get Started

It takes just a few minutes to install Pixie. To get started, check out the Install Guides.

Once installed, you can interact with Pixie using the:

Get Involved

Pixie is a community-driven project; we welcome your contribution! For code contributions, please read our contribution guide.

File a GitHub issue to report a bug or request a feature.
Join our Slack for live conversations and quick questions. We are also available on the CNCF slack.
Follow us on Twitter and YouTube.
Add our community meeting calendar.
Provide feedback on our roadmap.

Latest Releases

We version separate components of Pixie separately, so what Github shows as the "latest" release will only be the latest for one of the components. We maintain links to the latest releases for all components here:

Changelog

The changelog is stored in annotated git tags.

For vizier:

git for-each-ref refs/tags/release/vizier/$tagname --format='%(tag) %(contents)'

For the CLI:

git for-each-ref refs/tags/release/cli/$tagname --format='%(tag) %(contents)'

These are also published on the releases page.

Adopters

The known adopters and users of Pixie are listed here.

Software Bill of Materials

We publish a list of all the components Pixie depends on and the corresponding versions and licenses here.

Acknowledgements

The Pixie project would like to thank Equinix Metal via the CNCF Community Infrastructure Lab for graciously providing compute resources to run all the CI/CD for the project.

About Pixie

Pixie was contributed by New Relic, Inc. to the Cloud Native Computing Foundation as a Sandbox project in June 2021.

License

Pixie is licensed under Apache License, Version 2.0.

pixie's People

Stargazers

Watchers

Forkers

sjanulonoks 0x1a0b yzhao1012 nserrino htroisi chen0031 oazizi000 whiledoing longjohncoder logicalhan jamesmbartlett linuxerwang etep cxz acumenix biswazr igor-kupczynski 100hnomeunome bmangesh aloknnikhil rheehot wrightrocket lic17 vvalorous qiwei119 devtron-labs gosundy alejandrodnm randomguy4444 arush-sal rako9000 havivrosh request2609 doytsujin fengzixu pranay01 gongchengcheng negashev bobbyquennell daxmc99 haoqing0110 igxactly-forks fryckbos cheatbadgeshade laozhudetui mkliu647 mu-l troyconder nethippo swartz-k 3xpl0it3r angpa bpschmitt aaron-ai weizhoublue kishore21kumar avwsolutions hellodk jotak krishnaarava katepangliu fossabot admariner smuneebahmad mspaansen robertprast reloadbrain jessesanford ftiannew isabella232 icolabora muse117 xinmsu nathanawmk hmstepanek vdt ddelnano honeypotz-eu moteesh-in2tive danielqujun winmillwill xnmnc cmeninwa eagle9527 suryatmodulus developgo quynhlab pinghe jygastaud albertteoh xxfwajj84 hj3938 sunshian breakstones nkgfirecream rmak-cpi domechn anathema-one deadlessbirdxgx ademahmudf

pixie's Issues

etcd and nats fail to deploy on kubernetes version 1.18.0

Describe the bug
The api versions for the Pixie nats and etcd deployments do not work when run on a kubernetes v1.18.0 cluster.

To Reproduce
Steps to reproduce the behavior:

start a minikube cluster with the default kubernetes version
install pixie
deploy pixie
See error about etcd and yaml during deploy.

Expected behavior
Pixie should successfully deploy after calling px deploy. It failed on the second retry as well

Logs

✕    Deploying etcd  ERR: no matches for kind "EtcdCluster" in version "etcd.database.coreos.com/v1beta2
  ✕    Deploying NATS  ERR: no matches for kind "NatsCluster" in version "nats.io/v1alpha2" [0076] FATAL Failed to deploy Vizier deps

App information (please complete the following information):

Pixie version 0.1.27+Distribution.6e8b31d.20200403111855.1
K8s cluster version v1.18.0

Additional context
v1.18.0 is the default version of k8s that minikube comes with. this problem can be avoided on minikube by passing --kubernetes-version v1.17.0 to minikube start.

Auto-close the "you can now close this window" browser page

Is your feature request related to a problem? Please describe.
No

Describe the solution you'd like
Add a javascript tag or something to automatically close the window after a short delay, so I don't have to actually interact with my browser during the setup. Shouldn't be complicated to add as you already control the page that's shown after the authorization :)

Describe alternatives you've considered
None

┆Issue is synchronized with this Jira Task by Unito

Support ARM CPUs

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Follow this tutorial to deploy Microk8s on a pi cluster. I used the channel for Microk8s 1.19
enable the following features in microk8s,

addons:
  enabled:
    dashboard            # The Kubernetes dashboard
    dns                  # CoreDNS
    ha-cluster           # Configure high availability on the current node
    host-access          # Allow Pods connecting to Host services smoothly
    ingress              # Ingress controller for external access
    metrics-server       # K8s Metrics Server for API access to service metrics
    rbac                 # Role-Based Access Control for authorisation
    registry             # Private image registry exposed on localhost:32000
    storage              # Storage class; allocates storage from host directory

Create an account with cluster admin privilege using certificates.
Create a kubeconfig on an external instance of kubectl using that cluster admin account
Install Pixie alongside kubectl
Deploy the Pixie Demo
Deploy Pixie
Pods fail in various states. Log output for each pod below.

Expected behavior
successful deployment

Logs

boboysdadda@DESKTOP-US92ARK:~$ px deploy
Pixie CLI

Running Cluster Checks:
 ✔    Kernel version > 4.14.0
 ✔    Cluster type is supported
 ✔    K8s version > 1.12.0
 ✔    Kubectl > 1.10.0 is present
 ✔    User can create namespace
Installing version: 0.5.2
Generating YAMLs for Pixie
Deploying Pixie to the following cluster: microk8s

Is the cluster correct? (y/n) [y] : y
Found 5 nodes
 ✔    Creating namespace
 ✔    Deleting stale Pixie objects, if any
 ✔    Deploying secrets and configmaps
 ✔    Deploying Cloud Connector
 ⠼    Waiting for Cloud Connector to come online
[0142] FATAL Timed out waiting for cluster ID assignment

boboysdadda@DESKTOP-US92ARK:~$ kubectl get pods -n pl
NAME                                      READY   STATUS              RESTARTS   AGE
vizier-cloud-connector-5696d4d66b-2td4h   0/1     ContainerCreating   0          20h
cert-provisioner-job-kjw6n                0/1     Error               0          20h
cert-provisioner-job-trgcv                0/1     Error               0          20h

boboysdadda@DESKTOP-US92ARK:~$ kubectl describe pod cert-provisioner-job-kjw6n -n pl
Name:         cert-provisioner-job-kjw6n
Namespace:    pl
Priority:     0
Node:         pi4-k8s-node4/192.168.2.14
Start Time:   Tue, 06 Oct 2020 21:53:08 -0600
Labels:       app=pl-monitoring
              component=vizier
              controller-uid=32a94f06-a0fb-4db7-889d-98ea8636cfa4
              job-name=cert-provisioner-job
              vizier-bootstrap=true
Annotations:  cni.projectcalico.org/podIP: 10.1.217.16/32
              cni.projectcalico.org/podIPs: 10.1.217.16/32
Status:       Failed
IP:           10.1.217.16
IPs:
  IP:           10.1.217.16
Controlled By:  Job/cert-provisioner-job
Containers:
  provisioner:
    Container ID:   containerd://19bda16f93f349759f9d54878904d2d1eb0003d68b244bc8f393a60270fa7545
    Image:          gcr.io/pixie-prod/vizier/cert_provisioner_image:0.5.2
    Image ID:       gcr.io/pixie-prod/vizier/cert_provisioner_image@sha256:e76e704b00259fe4f8bee6b8761b3676dc57a3119f153a1f980ca390f2387a9b
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 06 Oct 2020 21:53:11 -0600
      Finished:     Tue, 06 Oct 2020 21:53:11 -0600
    Ready:          False
    Restart Count:  0
    Environment Variables from:
      pl-cloud-config  ConfigMap  Optional: false
    Environment:       <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from updater-service-account-token-4d7gj (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  updater-service-account-token-4d7gj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  updater-service-account-token-4d7gj
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age   From                    Message
  ----    ------     ----  ----                    -------
  Normal  Scheduled  20h   default-scheduler       Successfully assigned pl/cert-provisioner-job-kjw6n to pi4-k8s-node4
  Normal  Pulled     20h   kubelet, pi4-k8s-node4  Container image "gcr.io/pixie-prod/vizier/cert_provisioner_image:0.5.2" already present on machine
  Normal  Created    20h   kubelet, pi4-k8s-node4  Created container provisioner
  Normal  Started    20h   kubelet, pi4-k8s-node4  Started container provisioner

Name:         cert-provisioner-job-trgcv
Namespace:    pl
Priority:     0
Node:         pi4-k8s-node4/192.168.2.14
Start Time:   Tue, 06 Oct 2020 21:53:12 -0600
Labels:       app=pl-monitoring
              component=vizier
              controller-uid=32a94f06-a0fb-4db7-889d-98ea8636cfa4
              job-name=cert-provisioner-job
              vizier-bootstrap=true
Annotations:  cni.projectcalico.org/podIP: 10.1.217.17/32
              cni.projectcalico.org/podIPs: 10.1.217.17/32
Status:       Failed
IP:           10.1.217.17
IPs:
  IP:           10.1.217.17
Controlled By:  Job/cert-provisioner-job
Containers:
  provisioner:
    Container ID:   containerd://07ec2b65d5f38a892341e639b082fb6968ccee51d1c64c3085e228ca034c1f71
    Image:          gcr.io/pixie-prod/vizier/cert_provisioner_image:0.5.2
    Image ID:       gcr.io/pixie-prod/vizier/cert_provisioner_image@sha256:e76e704b00259fe4f8bee6b8761b3676dc57a3119f153a1f980ca390f2387a9b
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 06 Oct 2020 21:53:14 -0600
      Finished:     Tue, 06 Oct 2020 21:53:14 -0600
    Ready:          False
    Restart Count:  0
    Environment Variables from:
      pl-cloud-config  ConfigMap  Optional: false
    Environment:       <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from updater-service-account-token-4d7gj (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  updater-service-account-token-4d7gj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  updater-service-account-token-4d7gj
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age   From                    Message
  ----    ------     ----  ----                    -------
  Normal  Scheduled  20h   default-scheduler       Successfully assigned pl/cert-provisioner-job-trgcv to pi4-k8s-node4
  Normal  Pulled     20h   kubelet, pi4-k8s-node4  Container image "gcr.io/pixie-prod/vizier/cert_provisioner_image:0.5.2" already present on machine
  Normal  Created    20h   kubelet, pi4-k8s-node4  Created container provisioner
  Normal  Started    20h   kubelet, pi4-k8s-node4  Started container provisioner

boboysdadda@DESKTOP-US92ARK:~$ kubectl describe pod -n pl vizier-cloud-connector-5696d4d66b-2td4h
Name:           vizier-cloud-connector-5696d4d66b-2td4h
Namespace:      pl
Priority:       0
Node:           pi4-k8s-node4/192.168.2.14
Start Time:     Tue, 06 Oct 2020 21:53:08 -0600
Labels:         app=pl-monitoring
                component=vizier
                name=vizier-cloud-connector
                plane=control
                pod-template-hash=5696d4d66b
                vizier-bootstrap=true
Annotations:    fluentbit.io/parser: logfmt
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/vizier-cloud-connector-5696d4d66b
Containers:
  app:
    Container ID:
    Image:          gcr.io/pixie-prod/vizier/cloud_connector_server_image:0.5.2
    Image ID:
    Port:           50800/TCP
    Host Port:      50800/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Liveness:       http-get https://:50800/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      pl-cloud-config                      ConfigMap  Optional: false
      pl-cloud-connector-tls-config        ConfigMap  Optional: false
      pl-cloud-connector-bootstrap-config  ConfigMap  Optional: true
    Environment:
      PL_POD_NAME:                 vizier-cloud-connector-5696d4d66b-2td4h (v1:metadata.name)
      PL_JWT_SIGNING_KEY:          <set to the key 'jwt-signing-key' in secret 'pl-cluster-secrets'>  Optional: false
      PL_CLUSTER_ID:               <set to the key 'cluster-id' in secret 'pl-cluster-secrets'>       Optional: true
      PL_SENTRY_DSN:
      PL_DEPLOY_KEY:               <set to the key 'deploy-key' in secret 'pl-deploy-secrets'>  Optional: true
      PL_POD_NAMESPACE:            pl (v1:metadata.namespace)
      PL_MAX_EXPECTED_CLOCK_SKEW:  2000
      PL_RENEW_PERIOD:             1000
    Mounts:
      /certs from certs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from cloud-conn-service-account-token-jlg49 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  service-tls-certs
    Optional:    false
  cloud-conn-service-account-token-jlg49:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cloud-conn-service-account-token-jlg49
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason       Age                   From                    Message
  ----     ------       ----                  ----                    -------
  Normal   Scheduled    20h                   default-scheduler       Successfully assigned pl/vizier-cloud-connector-5696d4d66b-2td4h to pi4-k8s-node4
  Warning  FailedMount  20h (x12 over 20h)    kubelet, pi4-k8s-node4  MountVolume.SetUp failed for volume "certs" : secret "service-tls-certs" not found
  Warning  FailedMount  20h (x4 over 20h)     kubelet, pi4-k8s-node4  Unable to attach or mount volumes: unmounted volumes=[certs], unattached volumes=[certs cloud-conn-service-account-token-jlg49]: timed out waiting for the condition
  Warning  FailedMount  84s (x12 over 9m38s)  kubelet, pi4-k8s-node4  MountVolume.SetUp failed for volume "certs" : secret "service-tls-certs" not found
  Warning  FailedMount  49s (x4 over 7m35s)   kubelet, pi4-k8s-node4  Unable to attach or mount volumes: unmounted volumes=[certs], unattached volumes=[certs cloud-conn-service-account-token-jlg49]: timed out waiting for the condition

App information (please complete the following information):

Pixie version - 0.3.9+Distribution.8d5651b.20201006163355.1
K8s cluster version - Microk8s 1.19 on Pi4 cluster

ubuntu@pi4-k8s-master:~$ kubectl get nodes
NAME             STATUS   ROLES    AGE   VERSION
pi4-k8s-master   Ready    <none>   32h   v1.19.0-34+ff9309c628eb68
pi4-k8s-node4    Ready    <none>   32h   v1.19.0-34+ff9309c628eb68
pi4-k8s-node1    Ready    <none>   32h   v1.19.0-34+ff9309c628eb68
pi4-k8s-node3    Ready    <none>   32h   v1.19.0-34+ff9309c628eb68
pi4-k8s-node2    Ready    <none>   32h   v1.19.0-34+ff9309c628eb68

Live view of infrastructure utilization

// filing on behalf of @domingusj

Describe the question you are looking to answer
What is the resource utilization of my cluster and am I close to true/optimal capacity?

Example scenario
This is one of the most important and frequent question ops people have every-day so that they can scale and manage resources before failure.

It is hard to measure & report well. Existing tools don't do a great job and would great to get a useful view out of the box.

Desired output

Widgets: Node level USE metrics: http://www.brendangregg.com/usemethod.html
Widgets: Summary of the performance of workloads/services running on the nodes to give an indicator of stress (LET by pods )

Desired collaboration scheme

Do you want to share this view with your team (yes/no)? Yes
Do you want to persist this view in your team's git repo (yes/no)? Yes
Do you want to persist this view in pixie-community's git repo (yes/no)? Yes

Additional requirements

Ideally, pixie should build an understanding of expected load and use that to give a measure of true capacity

(Optional) Draft scripts
n/a

┆Issue is synchronized with this Jira Bug by Unito

Pixie does not support multiple kubernetes config files in KUBECONFIG

Describe the bug
KUBECONFIG can have multiple values separated the PATH-way.
Pixie seems to expect a unique file path value as it's attempting to stat on it.
It should instead read those files in order until it finds the configuration that works for the selected cluster.

To Reproduce
Steps to reproduce the behavior:

Set your KUBECONFIG to have multiple files stacked, export KUBECONFIG=config1:config2
Run px deploy
See error

Expected behavior
Handle the fact that the kubernetes config can be split across multiple files.

Side note: Deploying to Pixe Cluster ID: should read Deploying to Pixie Cluster ID: (typo at Pixie)

Screenshots

$ px deploy
[0000]  INFO Pixie CLI

  ___  _       _
 | _ \(_)__ __(_) ___
 |  _/| |\ \ /| |/ -_)
 |_|  |_|/_\_\|_|\___|

Deploying to Pixe Cluster ID: 5dbb3c2f-d2dd-44f5-a550-6e81ec381d70.
Deploying Pixie to the following cluster: minikube

Is the cluster correct? (y/n) [y] : y
[0002] FATAL Could not build kubeconfig error=stat /Users/xaf/.kube/config:/Users/xaf/.kube/<redacted>: no such file or directory

Logs
Not needed for this.

App information (please complete the following information):

Pixie version: 0.1.22+Distribution.7603c41.20200320124920.2
K8s cluster version: v1.11.10

Support Job mapping (similar to Service/Pod)

Is your feature request related to a problem? Please describe.
Jobs are not query-able as their own entities at the moment -> the best we can do is look at the Pods that back the Job right now. Would like to be able to query data based on Job name rather than backing pod.

Describe the solution you'd like
Be able to query Job stats like we do w/ Services in px/service_info
Describe alternatives you've considered
Use pods as basis, not as clean unfortunately as just querying Job.

┆Issue is synchronized with this Jira Bug by Unito

Configure the install path when curling

Is your feature request related to a problem? Please describe.
This currently forces installation to be in /usr/local/bin while I'd prefer using a different directory.

$ bash -c "$(curl -fsSL https://work.withpixie.ai/install.sh)"
==> INSTALL INFO:
The PX CLI will be installed to:
- Install PATH: /usr/local/bin

Continue (Y/n):

Describe the solution you'd like
I'd like to have a prompt (maybe defaulting to /usr/local/bin) allowing me to decide where I want this to be installed. Like:

$ bash -c "$(curl -fsSL https://work.withpixie.ai/install.sh)"
==> INSTALL INFO:
The PX CLI will be installed to:
- Install PATH (/usr/local/bin): /some/other/path

Will install Pixie at: /some/other/path
Continue (Y/n):

Describe alternatives you've considered
Support BASH environment variable by only setting the value of INSTALL_PATH if it's not already set by the environment.

px/most_http_data doesn't return results

Describe the bug
px/most_http_data should return HTTP information about the pod producing the most HTTP data. It doesn't appear to return any results right now.

To Reproduce
Steps to reproduce the behavior:

Go to the UI
Navigate to px/most_http_data
Notice that there are no results

Expected behavior
There should be output rows

Screenshots

Logs
Please attach the logs by running the following command:

./px collect-logs

App information (please complete the following information):

Pixie version 0.5.1
K8s cluster version v1.16.11-gke.5

Give PEMs general tolerations so that they can deploy on tainted nodes.

Is your feature request related to a problem? Please describe.
Some clusters use taints and tolerations for workload isolation or other scheduling concerns. This can prevent the vizier-pem pods from scheduling to some nodes.

Describe the solution you'd like
I'd like a flag to enable the vizier-pems to schedule everywhere. Something like a --tolerate flag on the deploy subcommand that creates the vizier-pem daemonset with these tolerations:

      - effect: NoSchedule
        operator: Exists
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoExecute
        operator: Exists

Describe alternatives you've considered
I've patched the daemonset after deployment to get around this. If this is deployed as a permanent fixture on the cluster, this would also be handled elsewhere.

Console inoperable on browsers that block http requests from https pages

Describe the bug
Some browsers (example, Firefox) block http connections while on an https page. Currently the console UI tries to access vizier-proxy-service via px proxy, which serves on http://127.0.0.1:31068, this will trigger a blocking action.

To Reproduce
Steps to reproduce the behavior:

Open Pixie Console UI in Firefox
Start px proxy
Attempt to use Console UI
Open Firefox developer console to look for Javascript errors.

Expected behavior
The Pixie UI should be able to access its dependencies via some other connectivity as well (directly via nodeport, lb, etc) or a valid https cert and https connectivity should be given to the proxy service.

Logs

From Firefox javascript console.

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://127.0.0.1:31067/graphql. (Reason: CORS request did not succeed).
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://127.0.0.1:31067/graphql. (Reason: CORS request did not succeed).
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://127.0.0.1:31067/graphql. (Reason: CORS request did not succeed).
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://127.0.0.1:31067/graphql. (Reason: CORS request did not succeed).

App information (please complete the following information):

Pixie version 0.1.15+Distribution.acd8f53.20200207161509.1
K8s cluster version Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.7-gke.23", GitCommit:"06e05fd0390a51ea009245a90363f9161b6f2389", GitTreeState:"clean", BuildDate:"2020-01-17T23:10:45Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}

┆Issue is synchronized with this Jira Task by Unito

Serialized protobuf data cannot be parsed

TBA: we need the log messages.

Reported in the ThousandEye (TE) onside demo 03-06.

The suspicion is that the uprobe fail to capture the data.
TE is using https://github.com/thanos-io/thanos, for which the protobuf parsing failure was reported for.
(Need more confirmation from the onsite folks)

┆Issue is synchronized with this Jira Bug by Unito

Filtered http-data

Describe the question you are looking to answer
Need data filtered.

Unable to deploy on minikube with k8s v1.11

// filed on behalf of @XaF

Describe the bug
UI Does

To Reproduce
Steps to reproduce the behavior:

Download and authorize CLI in minikube
Run deployment command
n -pl pods start running with flakiness
UI does not refresh to console view.

Expected behavior
After deployment, UI should refresh to console view to execute queries

Screenshots
Shared via zoom call

Logs
Please attach the logs by running the following command:

pixie_logs_20200221141224 (1).zip

App information (please complete the following information):

Pixie version: v0.1.16
K8s cluster version: v1.11

Additional context
n/a

┆Issue is synchronized with this Jira Task by Unito

Broken link at the end of "Debugging Go in prod using eBPF" post

Post: https://blog.pixielabs.ai/blog/ebpf-function-tracing/post/
Actual link on open positions: https://pixielabs.ai/career
Expected link on open positions https://pixielabs.ai/careers/

Test Issue for Git+Slack

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Overflow of Graphs not Viewable in Live View

Describe the bug
I have a table that doesn't show up in the live view. I have to zoom out to see them as I can't scroll down.
To Reproduce
Steps to reproduce the behavior:

Go to /live
Choose default script
Edit placement config on first graph by increasing height to 12 (shoudl be sufficient to push other graphs off the screen
Try to scroll to see the graphs below w/ no luck

Expected behavior
Be able to scroll or the window adjust graphs so they are visible.

Screenshots
Here's me increasing height to 12, and can't scroll to see other tables now.

Logs
n/a

App information (please complete the following information):

Pixie version 0.1.21
K8s cluster version: irrelevant

┆Issue is synchronized with this Jira Bug by Unito

`px deploy` doesnt work unless config in config file

I have my kube config in many kubeconfig files. I use $KUBECONFIG and don't have one config file.

See screenshot below.

my $KUBECONFIG

/Users/roopak/.kube/staging-roopakv-ro:/Users/roopak/.kube/staging-roopakv-rw:/Users/roopak/.kube/sandbox-roopakv-rw:/Users/roopak/.kube/sandbox-roopakv-ro:/Users/roopak/.kube/production-roopakv-rw:/Users/roopak/.kube/production-roopakv-ro

Expected behavior
I expect pixie to be able to find cluster info from the right file in kubeconfig.

Screenshots
I see the following issue when i do pixie deploy.

Logs
px logs seems to not work.

App information (please complete the following information):

Pixie version -> 0.1.15+Distribution.acd8f53.20200207161509.1
K8s cluster version -> doesnt matter in this case

┆Issue is synchronized with this Jira Task by Unito

DB View for Pod/Service

Describe the use-case
Need to see which of the supported DB protocols a pod/service talks to.

Example scenario
Ie would like to know that my service talks to cassandra, postgres, etc.

Desired output
Table that links to the details page for that db.

Could not see JVM stats

// filed on behalf of @berilane-te @shashank-te

Describe the bug

JVM stats query fails to retrieve jvm metrics

To Reproduce

Run:

px.display(px.DataFrame(table='jvm_stats', start_time='-30s'))

Logs

jvm info:

md_line: java -server -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9010 -Dcom.sun.management.jmxremote.rmi.port=9010 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=127.0.0.1 -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMPercentage=70.0 -Djava.security.egd=file:/dev/./urandom -Djava.net.preferIPv4Stack=true -jar app.jar --spring.config.additional-location=file:/te/config/,

App information (please complete the following information):

Pixie version: v0.1.18

Additional context

Pixie internal issue: https://pixie-labs.atlassian.net/browse/PL-1564 cc @oazizi000 @yzhao1012

`px get pem` doesn't work on minikube.

Describe the bug
Failed to get pem.

To Reproduce
Steps to reproduce the behavior:

px deploy
px get pem

Expected behavior
Expected to get pem.

Screenshots
n/a

Logs

$ px deploy
[0000]  INFO Pixie CLI

  ___  _       _
 | _ \(_)__ __(_) ___
 |  _/| |\ \ /| |/ -_)
 |_|  |_|/_\_\|_|\___|

Deploying to Pixe Cluster ID: 0d089cc7-0f1c-40a2-b856-68b12964ce1a.
Deploying Pixie to the following cluster: minikube

Is the cluster correct? (y/n) [y] : y
Found 1 nodes
 ✔    Creating namespace 
 ✔    Installing certs 
 ✔    Loading secrets 
 ✔    Updating clusterroles 
 ✔    Downloading Vizier YAMLs 
 ✔    Deploying etcd 
 ✔    Deploying NATS 
 ✔    Deploying Vizier 
Waiting for services and pods to start...
Waiting for PEMs to deploy ...
Node 1/1 instrumented
PEMs successfully deployed

$ px get pem
[0000]  INFO Pixie CLI

  ___  _       _
 | _ \(_)__ __(_) ___
 |  _/| |\ \ /| |/ -_)
 |_|  |_|/_\_\|_|\___|

[0000]  INFO Selected Vizier address addr=https://1583921831930290000.c2321487.clusters.withpixie.ai:30003
Executing Script:  8d5f63a0-461d-4866-a850-38770ff05098
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1332fe0]

goroutine 1 [running]:
pixielabs.ai/pixielabs/src/utils/pixie_cli/cmd.formatResultsAsTable(0xc000318220)
	src/utils/pixie_cli/cmd/run.go:170 +0x40
pixielabs.ai/pixielabs/src/utils/pixie_cli/cmd.mustFormatQueryResults(0xc000318220, 0x0, 0x0)
	src/utils/pixie_cli/cmd/run.go:154 +0x8f
pixielabs.ai/pixielabs/src/utils/pixie_cli/cmd.glob..func7(0x22c93e0, 0xc0002ce910, 0x1, 0x1)
	src/utils/pixie_cli/cmd/get.go:36 +0x223
github.com/spf13/cobra.(*Command).execute(0x22c93e0, 0xc0002ce8c0, 0x1, 0x1, 0x22c93e0, 0xc0002ce8c0)
	external/com_github_spf13_cobra/command.go:844 +0x2aa
github.com/spf13/cobra.(*Command).ExecuteC(0x22ca100, 0x18243e0, 0xc00010ee00, 0x0)
	external/com_github_spf13_cobra/command.go:945 +0x317
github.com/spf13/cobra.(*Command).Execute(...)
	external/com_github_spf13_cobra/command.go:885
pixielabs.ai/pixielabs/src/utils/pixie_cli/cmd.Execute()
	src/utils/pixie_cli/cmd/root.go:114 +0x96
main.main()
	src/utils/pixie_cli/px.go:61 +0x571

App information (please complete the following information):

Pixie version: 0.1.21+Distribution.1ccac93.20200310091514.1
K8s cluster version: v1.14

Additional context
n/a

Unable to parse gRPC http2 requests

// filed on behalf of @berilane-te @shashank-te

Describe the bug
Unable to see parsed protobuf data for Thanos requests

To Reproduce

Ran the "sample http_events" script
Ran the query below

t1 = px.DataFrame(table='http_events', select=['time_', 'remote_addr', 'remote_port', 'http_resp_status', 'http_resp_message', 'http_resp_body', 'http_resp_latency_ns', 'http_major_version'], start_time='-30s')
t1.http_resp_latency_ms = t1.http_resp_latency_ns / 1.0E6
t1 = t1[t1['http_major_version'] == 2]
t2 = t1.drop(columns=['http_resp_latency_ns']).head(n=100)
px.display(t2)

Expected behavior
Expected to see parsed data.

Logs
n/a

App information (please complete the following information):

Pixie version: v0.1.18

┆Issue is synchronized with this Jira Task by Unito

MySQL Live View

Describe the question you are looking to answer
Analyze the statistics of a service's MySQL connection.
Should include latency of requests over time

Example scenario
Service experiences SLO violations, can't diagnose problem by looking at services that are dependent on, want to look at database interactions. Turns out database request latencies are spiking, can then switch to another live view to see the rates of database interactions across all services in the cluster and can take action appropriately.

Desired output
Output Widgets

L
E
T
(Stretch) Top-K queries sent

Desired collaboration scheme

Do you want to share this view with your team (yes/no)? yes
Do you want to persist this view in your team's git repo (yes/no)? yes
Do you want to persist this view in pixie-community's git repo (yes/no)? yes

(Optional) Draft scripts
I think we can adapt the service_info view to do this as well.

###############################################################
# The following can be edited to k8s object and name to match.
###############################################################
# The Kubernetes object to filter on.
#    Options are ['pod', 'service']
k8s_object = 'pod'

# If you want to filter the object by name, enter the partial
# or full name here.
match_name = 'sock-shop'

# Names of output columns
src_name = 'client'
dest_name = 'server'
###############################################################
ip = 'remote_addr'
df = px.DataFrame(table='mysql_events', start_time='-60s')

df[k8s_object] = df.ctx[k8s_object]
df = df[df[k8s_object] != '' and px.contains(df[k8s_object], match_name)]
#px.display(df.head(100))
df = df.groupby([k8s_object, ip]).agg(count=(ip, px.count))

df[src_name] = df[k8s_object]
df[dest_name] = df[ip]
df['pod_id'] = px.ip_to_pod_id(df[ip])
df['requestor_pod'] = 	px.pod_id_to_pod_name(df.pod_id)
# Enable if you want pod name
df['requestor_service'] = 	px.pod_id_to_service_name(df.pod_id)
px.display(df[[src_name, dest_name, 'requestor_pod', 'count']])

┆Issue is synchronized with this Jira Task by Unito

Support Deployment mapping (similar to Service/Pod)

Is your feature request related to a problem? Please describe.
Deployments are another type of k8s object that is mostly distinct from services and pods. Most importantly, it's possible that 2+ deployments back the same service so services might not be sufficient k8s objects in the future.

Describe the solution you'd like
Be able to query deployments like we query services/pods.

Describe alternatives you've considered
Most of the time services represent deployments 1:1, so you could use service queries to achieve what you want from Deployments. However k8s doesn't have any enforced constraint for this and it's possible for deployments to not back a service at all.

Additional context
Add any other context or screenshots about the feature request here.

┆Issue is synchronized with this Jira Bug by Unito

Pin point jvm memory leak

// filing on behalf of @tuxology so that we can track

Example scenario
So here is what happened. We have a runtime service which customers use - its an agent that gets attached to customer's java apps. Javaagents are notorious - for obvious reasons (security and performance. But apparently someone in industry decided it was ok to use them. So we shipped an agent which accidently had a memory leak (edited) JVM handles the memory and would run GC intermittently

We cold not identify this as part of customer's container (which had lots of workloads). So I had to isolate it locally and the moment i saw the sawtooth waveform, I knew from experience what it is.

https://twitter.com/tuxology/status/1019317134743048192?s=20

https://twitter.com/tuxology/status/1136683456451055617?s=20

I built the infra to discover this using grafana + periodic tests. So next time we discovered it faster (see grafana charts). If Px was available, we would have been able to discover this in production I think. By specifically collecting memory metrics filtered by the ppid in container

Desired output
Output Widgets

Recreate this scenario with a sample java app
Explain what is happening
Show how we can identify with VirtualVM locally
Show how we can identify it when its deployed in a cluster by pinpoint analysis with `px1

Desired collaboration scheme

Do you want to share this view with your team (yes/no)? n/a
Do you want to persist this view in your team's git repo (yes/no)? n/a
Do you want to persist this view in pixie-community's git repo (yes/no)? n/a

Additional requirements
n/a

(Optional) Draft scripts
n/a

┆Issue is synchronized with this Jira Bug by Unito

Service SLI View w/ Dependency Tables

// filing on behalf of @berilane-te @shashank-te

Describe the question you are looking to answer

When investigating a specific service, I want to understand the health of the services down-stream dependencies (databases) and up-stream dependencies (ingress, other services etc. )

Example scenario

A frequent debugging scenario is when a service's latency spikes when their database read writes spikes. For an SRE or SWE debugging, it's difficult to know whether the db is trashed because of their service or other issues.

Desired output

Service SLI Metrics Charts (latency, error-rate, rps)
Service-specific K8s metrics (CPU, Memory, IOPS etc. by pod)
Downstream Dependencies Summary Table ( SLI Metrics by dependency)
Upstream Dependencies Summary Table ( SLI Metrics by dependency)

Additional requirements

We should be able to navigate too the dependencies own live view from the table
Ideally, we would like to view the dependencies in a format that's easy to understand (eg: topological map)

Desired collaboration scheme

Do you want to share this view with your team (yes/no)? Yes
Do you want to persist this view in your team's git repo (yes/no)? Yes
Do you want to persist this view in pixie-community's git repo (yes/no)? n/a

(Optional) Draft scripts
n/a

Issues with v1beta apiVersion with px demo

Describe the bug

ℹ️ This is low priority IMO

Issues with default sock shop px demo due to apiVersion

To Reproduce
Steps to reproduce the behavior:

px demo deploy px-sock-shop
[0000]  INFO Pixie CLI
Deploying demo app px-sock-shop to the following cluster: px-test

Is the cluster correct? (y/n) [y] : y
 ✔    Creating namespace px-sock-shop 
 ✕    Deploying px-sock-shop YAMLs  ERR: no matches for kind "Deployment" in version "extensions/v1beta1"
[0002] FATAL Failed to deploy demo application. error=no matches for kind "Deployment" in version "extensions/v1beta1"

Expected behavior
Demo works seamlessly or informs the user what to do for upgrades

App information (please complete the following information):

0.1.25+Distribution.59beb5c.20200326150522.1
K8s cluster version

Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T21:03:42Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:12:17Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}

Additional context
I have seen this previously, usual workaround for me was is to update apiVersion (for example here: https://github.com/microservices-demo/microservices-demo/blob/master/deploy/kubernetes/complete-demo.yaml#L1) to do something like this:

+++ b/deploy/kubernetes/complete-demo.yaml
@@ -1,4 +1,4 @@
-apiVersion: extensions/v1beta1
+apiVersion: apps/v1
 kind: Deployment
 metadata:
   name: carts-db
@@ -6,11 +6,14 @@ metadata:
     name: carts-db
   namespace: sock-shop
 spec:
+  selector:
+    matchLabels:
+      app: carts-db
   replicas: 1
   template:
     metadata:
       labels:
-        name: carts-db
+        app: carts-db
     spec:
       containers:
       - name: carts-db
@@ -52,7 +55,7 @@ spec:
   selector:
     name: carts-db
 ---

┆Issue is synchronized with this Jira Bug by Unito

Carts pod in px-sock-shop demo experienced image pull error

Describe the bug
The px-sock-shop service doesn't fully start up for load testing to succeed, because the carts pod didn't successfully pull the image.

To Reproduce
Steps to reproduce the behavior:

minikube start on Mac
install px
px demo deploy px-sock-shop
kubectl -n px-sock-shop get pods

Expected behavior
carts pod successfully running

Logs
kubectl px-sock-shop describe

Successfully assigned px-sock-shop/carts-84cd4fbf9b-k2cbs to minikube
  Warning  Failed     3m17s               kubelet, minikube  Failed to pull image "weaveworksdemos/carts:0.4.8": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 172.16.154.2:53: read udp 172.16.154.135:49071->172.16.154.2:53: i/o timeout
  Warning  Failed     3m17s               kubelet, minikube  Error: ErrImagePull
  Normal   BackOff    3m16s               kubelet, minikube  Back-off pulling image "weaveworksdemos/carts:0.4.8"
  Warning  Failed     3m16s               kubelet, minikube  Error: ImagePullBackOff
  Normal   Pulling    3m6s (x2 over 10m)  kubelet, minikube  Pulling image "weaveworksdemos/carts:0.4.8"

App information (please complete the following information):

px version returns 0.1.27+Distribution.6e8b31d.20200403111855.1
K8s cluster version was v1.17.0

Additional context
His machine was resource constrained with slow internet, so that could be the problem.

Path expansion issue in install script creates a bad directory for px install

Describe the bug
Wrong installation path on Linux with installer script

To Reproduce
Steps to reproduce the behavior:

curl -fsSL https://withpixie.ai/install.sh | bash
Logs show the following

- PX CLI has been installed to: INSTALL_PATH=${INSTALL_PATH:-${DEFAULT_INSTALL_PATH}}. Make sure this directory is in your PATH.

It creates a directory with that name and pixie is inside:

$ ls INSTALL_PATH=\$\{INSTALL_PATH:-\$\{DEFAULT_INSTALL_PATH\}\} 
px

Expected behavior

The install script should either prompt for installation path.
If non-interactive mode is used as a standard, it should expect an installation path arg or should install the bin in the same directory from where it is called

Version/info

[0000]  INFO Pixie CLI
0.1.25+Distribution.59beb5c.20200326150522.1

Additional context
Running on Elementary OS (Ubuntu 18.04 derivative) with zsh. But pixie spawns bash for install script

┆Issue is synchronized with this Jira Bug by Unito

Show list of sample views in live ui

// filing based on brainstorming with @berilane-te @shashank-te

Is your feature request related to a problem? Please describe.
The learning curve to write scripts to build live views from scratch is steep. Having a list of sample live views would be helpful for analysis & composing new views

Describe the solution you'd like
To start, having a list of sample live views similar to the console will be great. Some examples discussed:

Overview of all services (similar to HTTP stats per service script)
Service-specific view (#19)
Database centric view based on external endpoints

Additional context

This is already in plan as part of the live ui roadmap

cc @malthusyau @aimichelle

Authentication failure when installing Pixie CLI

Pixie Community slack user Abhishek Saini ran into the following error when trying to install Pixie CLI. Filing this bug to track.

Push docs independently of cloud

Not P0 and as discussed will help keep frequent (ideally daily) docs update cadence.

┆Issue is synchronized with this Jira Story by Unito

Pod Live View is missing expected column (container) for CPU and memory widgets

Describe the bug
The vis.json expects a column called 'container' but it doesn't exist for the resource_timeseries table, causing the container to be omitted from the timeseries chart label.

To Reproduce
Steps to reproduce the behavior:

Go to the px/pod view and look at the CPU chart.

Expected behavior
The container name should show up in the legend.

Colab links broken in docs

reported by pixienaut:

Negative latency of mysql requests

// filing on behalf of @berilane-te @shashank-te

Describe the bug
Some mysql traces show very large negative latency values

To Reproduce
Ran "Sample HTTP Data" script

Expected behavior
Shouldn't be negative :)

Logs
n/a

App information (please complete the following information):

Pixie version: v0.1.18

Additional context

pixie internal issue: https://pixie-labs.atlassian.net/browse/PL-1563 cc @oazizi000 @yzhao1012

No validation on UUID submitted via px command line tool

Describe the bug
When using px to deploy to a cluster, a user can pass in their own cluster_id. Other components have a requirement for the format of this id, leading in a failure to boot.

To Reproduce
Steps to reproduce the behavior:

px deploy --cluster_id="hello"
kubectl logs deployment/vizier-cloud-connector vizier-cloud-connector-server -n pl | grep UUID

This results in the following error:

time="2020-02-26T23:54:20Z" level=fatal msg="Could not parse cluster_id" error="uuid: incorrect UUID length: hello"

Expected behavior
The px command line tool should validate that the UUID is valid and has a valid format

App information (please complete the following information):

Pixie version 0.1.15+Distribution.acd8f53.20200207161509.1
K8s cluster version Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.7-gke.23", GitCommit:"06e05fd0390a51ea009245a90363f9161b6f2389", GitTreeState:"clean", BuildDate:"2020-01-17T23:10:45Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}

Upgrade service_info script

Goals:

Improve the usefulness of service_info script by adding dependency tables
Make script compact (target <10 lines) to make it easier for non-pixies to understand & tweak pxl scripts

Desired Output

﹍Live View﹍

Input Arguments
- Service Name
Output Widgets
- Latency - p50
- Latency - p90
- Latency - p99 (if we can fit it)
- Error Rate (as % of total requests)
- Throughout (as rps)
- CPU% (by pod)
- Resident Memory Size (by pod)
- List of Dependencies with following columns (will require table widget support which we don't have)
  - Service_Name (n/a if we don't have it)
  - End-point IP
  - Type (downstream or upstream)
  - Latency - p50
  - Latency - p90
  - Latency - p99
  - Error Rate
  - RPS
  - CPU % (if we have it)
  - Resident Memory Size (if we have it)

﹍CLI Output﹍

Command: px run px/service_info
- Output: ask to enter service-name arg. suggest running px get services to get a list of services
Command: px run px/service_info -s sock-shop/front-end
- Output:
  - An abridged version of live view as a table with the following columns
    - Pod name
    - Latency - p50
    - Latency - p90
    - Latency - p99
    - Error Rate
    - RPS
    - CPU %
    - Resident Memory Size
  - Link to the live URL (if possible: https://pixie-labs.atlassian.net/browse/PL-1584?filter=-2 )

User-Story

See: #19

Describe the question you are looking to answer

When investigating a specific service, I want to understand the health of the services down-stream dependencies (databases) and up-stream dependencies (ingress, other services etc. )

Example scenario

Desired output

Service SLI Metrics Charts (latency, error-rate, rps)
Service-specific K8s metrics (CPU, Memory, IOPS etc. by pod)
Downstream Dependencies Summary Table ( SLI Metrics by dependency)
Upstream Dependencies Summary Table ( SLI Metrics by dependency)

Additional requirements

We should be able to navigate too the dependencies own live view from the table
Ideally, we would like to view the dependencies in a format that's easy to understand (eg: topological map)

Desired collaboration scheme

Do you want to share this view with your team (yes/no)? Yes
Do you want to persist this view in your team's git repo (yes/no)? Yes
Do you want to persist this view in pixie-community's git repo (yes/no)? n/a

cc [~accountid:5bc4fa648997913f6f1c5ef0] [~accountid:5b8f208e497b882c77e91ab4] [~accountid:5d925e04f5d3c10d8bfd071a] [~accountid:5b8f7f3e8aaa0f2bd11fa903]

┆Issue is synchronized with this Jira Story by Unito

Track lifespan of TCP sessions between pods.

// filing on behalf of @11Takanori

Describe the question you are looking to answer
I want to trace the lifespan of TCP sessions between pods. My idea is as follows.

LPOD   LPORT  RPOD   RPORT  TX_KB  RX_KB  MS
Pod A  54146  Pod B  8080   133    97     3.99
Pod A  54151  Pod B  8080   131    96     4.10
Pod B  32644  Pod C  3000   345    228    8.41

Example scenario

I would like to trace which new TCP connections are created, and their duration, look for inefficiencies, and specify network traffic not working properly between specific pods.

Desired output
Output Widgets

Widget A: Table
Widget B: Not sure of others

Desired collaboration scheme

Do you want to share this view with your team (yes/no)? n/a
Do you want to persist this view in your team's git repo (yes/no)? n/a
Do you want to persist this view in pixie-community's git repo (yes/no)? n/a

Additional requirements
n/a

(Optional) Draft scripts
n/a

cc @zasgar @philkuz

┆Issue is synchronized with this Jira Bug by Unito

Drill Down in the UI

Is your feature request related to a problem? Please describe.
Would like to take a live view such as a service-level overview and drill down into details about individual Service-level views.

Describe the solution you'd like
e.g.
SLO violation occurs on front-end service. SRE/ SWE goes to live view for front-end, sees that traffic between front-end and user service has a huge spike in latency, wants to look at the user-service view to drill down.
Describe alternatives you've considered
Alternatives are to manually jump around dashboards and update the variables, could get very clunky and easy to mess up and hard to revert back
Additional context
Add any other context or screenshots about the feature request here.

┆Issue is synchronized with this Jira Bug by Unito

Unable to create new account

reported by @mikolajkosmal and repro'd by @ishanmkh

Describe the bug
Unable to create a new account via gsuite login. Get the following error:

To Reproduce
Steps to reproduce the behavior:

Go to sign-up page
Click on signup with google
See error

Expected behavior
Successful login

Screenshots
attached above

Logs
n/a

App information (please complete the following information):

Pixie version: v0.1.21
K8s cluster version: n/a

Additional context

Allow deletion of clusters from live view

Once a cluster is created it is permanently tracked in the UI. As user might want to delete the cluster and no longer view it.

Documentation error in installation instructions

Describe the bug
Documentation here has incorrect curl syntax:

https://docs.pixielabs.ai

./bash -c "$(curl-fsSL https://withpixie.ai/install.sh)"

Correct syntax for most systems would be:

bash -c "$(curl -fsSL https://withpixie.ai/install.sh)"

Services and pods sometimes don't show up on http traffic on small clusters

Describe the bug
Filing on behalf of @jhunt. In minikube, sometimes the metadata service gets flaky which leads to no pods or services or http traffic being attributed to pods/services. As a result, many of the live views are blank.

To Reproduce
Steps to reproduce the behavior:
Run Pixie and px-sock-shop on minikube and go to the px/http_request_stat script, see that there are no results. go to the px/http_data script, see that there are in fact results from px-sock-shop, but with no pod or service attributed.

Expected behavior
All of the services and pods to show up.

Logs
vizier-metadata

time="2020-07-01T17:43:33Z" level=info msg="HTTP Request" req_method=POST req_path="/pl.vizier.services.metadata.MetadataService/GetAgentTableMetadata" resp_code=200 resp_size=70069 resp_status=OK resp_time=8.539313ms
E0701 17:43:34.206016       1 leaderelection.go:356] Failed to update lock: Put https://10.96.0.1:443/api/v1/namespaces/pl/endpoints/metadata-election: context deadline exceeded
I0701 17:43:34.206194       1 leaderelection.go:277] failed to renew lease pl/metadata-election: timed out waiting for the condition
E0701 17:43:34.206443       1 leaderelection.go:296] Failed to release lock: resource name may not be empty
time="2020-07-01T17:43:34Z" level=error msg="leadership lost: vizier-metadata-5b4cf4b6c8-mwpr6"
panic: leadership lost

goroutine 43 [running]:
pixielabs.ai/pixielabs/src/shared/services/election.(*K8sLeaderElectionMgr).runElection.func2()
	src/shared/services/election/election.go:118 +0xe9
k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run.func1(0xc0000c2480)
	external/io_k8s_client_go/tools/leaderelection/leaderelection.go:200 +0x40
k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run(0xc0000c2480, 0x1cd0220, 0xc000345300)
	external/io_k8s_client_go/tools/leaderelection/leaderelection.go:209 +0x15e
k8s.io/client-go/tools/leaderelection.RunOrDie(0x1cd0220, 0xc0001c80c0, 0x1ce15a0, 0xc0001cc280, 0xb2d05e00, 0x3b9aca00, 0xee6b280, 0x1b0d9b8, 0xc0003d9220, 0xc0001ac130, ...)
	external/io_k8s_client_go/tools/leaderelection/leaderelection.go:221 +0x96
pixielabs.ai/pixielabs/src/shared/services/election.(*K8sLeaderElectionMgr).runElection(0xc0000c0230, 0x1cd0220, 0xc0001c80c0, 0xc0001ac130, 0xc000436020, 0x20)
	src/shared/services/election/election.go:98 +0x2cf
created by pixielabs.ai/pixielabs/src/shared/services/election.(*K8sLeaderElectionMgr).Campaign
	src/shared/services/election/election.go:138 +0x10b

Liveness probe failed

App information (please complete the following information):

Pixie version: 0.3.8
K8s cluster version: v1.18.3

Additional context
Minikube, 3-node, running containerd, atop KVM

Can't deploy Pixie with Kind

Describe the bug
PEMs doesn't collect any data (process_stats, network_stats etc)

To Reproduce
Steps to reproduce the behavior:

create kind cluster
install px
deploy pixie
run any scripts

Expected behavior
Should deploy as expected

Screenshots
n/a

Logs
pixie_logs_20200424021521.zip

App information (please complete the following information):

Pixie version
K8s cluster version

Additional context
Add any other context about the problem here.

Account auth failed after claiming site

Describe the bug
Auth0 returns an error (screengrab below) after claiming the site

To Reproduce
Steps to reproduce the behavior:

Go to https://withpixie.ai/create
Click on continue
Auth via Google
See error

Expected behavior

Sign-up with google should be successful and should drop into the deployment instructions.

Screenshots

Logs

TBD

App information (please complete the following information):

Pixie version: v0.1.16
K8s cluster version: n/a

Additional context

Issue reported by @rileyberton

Pixie console disappears and reverts to instructions ~30 min after deploying pixie

Describe the bug
Around 30 min after deploying Pixie, the main webpage stops showing the console and reverts back to the instructions page.

kubectl get pods -n pl shows no issues

kelvin-6d8df6f584-45jhf                  1/1     Running   0          33m
nats-operator-5d45f9ffd6-6gqds           1/1     Running   0          33m
pl-etcd-0                                1/1     Running   0          33m
pl-nats-1                                1/1     Running   0          33m
vizier-api-89f599c87-dpxj5               1/1     Running   0          33m
vizier-certmgr-764f7c76bd-4l8gg          1/1     Running   0          33m
vizier-cloud-connector-7444d48f9-2z2hx   1/1     Running   0          33m
vizier-metadata-6b85656996-n2r6x         1/1     Running   0          33m
vizier-pem-4m2lf                         1/1     Running   0          33m
vizier-pem-7jwpr                         1/1     Running   0          33m
vizier-pem-8jnvl                         1/1     Running   0          33m
vizier-pem-qq4rg                         1/1     Running   0          33m
vizier-proxy-b4cfb96c9-tmtw5             1/1     Running   0          24m
vizier-query-broker-5c464c494-5mmq7      1/1     Running   0          33m

To Reproduce

Deploy Pixie
Try some queries
Wait ~30 min after deploy
Check https://withpixie.ai and see instructions page again

Expected behavior
Be able to query after 30 minutes

App information (please complete the following information):

Pixie version: v0.1.18
K8s cluster version v1.16

PodSecurityPolicies prevent pixie pods from getting deployed.

This was observed on a cluster deployed using https://github.com/gravitational/gravity. The exact error message is:

Vizier Version: 0.3.12

  Warning  FailedCreate  4m44s (x18 over 15m)  replicaset-controller  Error creating: pods "vizier-cloud-connector-56f65959fb-" is forbidden: unable to validate against any pod security policy: []

We can probably fix this by adding a security policy for pixie deployed pods.

When will live be live?

Is your feature request related to a problem? Please describe.
I have a need to debug fast

Describe the solution you'd like
Faster than anything humankind has ever seen

Describe alternatives you've considered
All of them

Additional context
This is a test so please ignore.

Modify Pod Live View query to only fetch data from the pod's agent, rather than fanning out to all agents.

Is your feature request related to a problem? Please describe.
The Pod Live View currently looks at outbound traffic from a given pod. As a result, the query generated for this view will need to run on every agent in a cluster. For larger clusters, that makes this pod view less efficient. By omitting outbound traffic, we can make this view much faster. If it is important to see the traffic out of a pod, the Namespace Live View provides that information in a service graph. In addition, px/pod_edge_stats will show an even more detailed breakdown of traffic between two pods.

Describe the solution you'd like
Pod Live View to only run on a single agent.

Describe alternatives you've considered
Introducing another Pod live view for this purpose, a copy of the current one with outbound traffic removed. However, it currently seems better for the default Pod view to be fast and other existing views to be utilized for the outbound traffic information.

Additional context
N/A

Install guides link on README is broken

The link points to https://docs.withpixie.ai/install-guides/

`px auth login` should print out the URL

Is your feature request related to a problem? Please describe.
From Roopak: "during pixie login list the auth url you open. I have multiple chrome profiles, it opened the wrong one (nothing you can do), i had to cancel move the command to the right screen and then click again. if you had the url printed on the console, it would be easier"

pixie-io / pixie Goto Github PK

pixie's Introduction

Why Pixie?

Use Cases

Network Monitoring

Infrastructure Health

Service Performance

Database Query Profiling

Request Tracing

Continuous Application Profiling

Distributed bpftrace Deployment

Dynamic Go Logging

Get Started

Get Involved

Latest Releases

Changelog

Adopters

Software Bill of Materials

Acknowledgements

About Pixie

License

pixie's People

Stargazers

Watchers

Forkers

pixie's Issues

Recommend Projects

Recommend Topics

Recommend Org