Coder Social home page Coder Social logo

istio-cost-analyzer's Introduction

Istio Cost Analyzer

The Istio Cost Analyzer is a tool that allows you to analyze the costliest workload links in your cluster. It relies on Kubernetes/Istio and Prometheus to gather data, and uses publicly-available cloud egress rates to estimate the overall egress costs of your services.

Usage

To use this on your kubernetes cluster, make sure you have a kubeconfig in your home directory, and make sure Istio is installed on your cluster, with the prometheus addon enabled. You must also have a HEALTHY Istio Operator available.

Installation

To install the istio-cost-analyzer binary:

go install github.com/tetratelabs/istio-cost-analyzer@latest

Setup

The setup command does a few things:

  • Edits Istio Operator config to add custom prometheus metrics (a destination_locality label on an Istio metric).
  • Creates a Mutating Webhook that gets called when a new deployment is created. This mutating webhook runs in a pod and has associated RBAC permissions, Services, etc.
  • Labels existing pods & deployments in said --targetNamespace.

You can either run the following command and have a webhook handle everything all existing Deployments and all Deployments created in the future:

istio-cost-analyzer setup
Flag Description Default Value
targetNamespace Namespace which the cost analyzer will watch/analyze default
analyzeAll (-a) Adding this flag will cause the cost analyzer to analyze all namespaces. Don't set this if you set targetNamespace. false
cloud Cloud on which your cluster is running (node info varies cloud to cloud -- inferred from Node info) Inferred from Node info
analyzerNamespace Namespace in which cost analyzer config will exist (you usually don't need to set this) istio-system

Running

Run:

istio-cost-analyzer analyze
Flag Description Default Value
cloud Cloud on which your cluster is running (node info varies cloud to cloud). Options are gcp or aws. If you are on GCP or AWS, you don't need to set this as it is inferred. Inferred from Node info
prometheusNamespace Namespace in which the prometheus pod exists (you usually don't need to set this) istio-system
pricePath For non-standard aws/gcp rates (on-prem, negotiated rates). If you set this, you don't need to set cloud. See /pricing (you usually don't need to set this) None
details Extended table view that shows both destination and source workload/locality, instead of just source. false
start RFC3999 UTC timestamp that indicates from when to start analyzing data. 0 (beginning)
end RFC3999 UTC timestamp that indicates to when to stop analyzing data. time.Now()

The output should look like (without --details):

Total: <$0.01

SOURCE WORKLOAD	SOURCE LOCALITY	COST   
productpage-v1 	us-west1-b     	<$0.01	
reviews-v2     	us-west1-b     	-     	
reviews-v3     	us-west1-b     	-  

With --details:

Total: <$0.01

SOURCE WORKLOAD	SOURCE LOCALITY	DESTINATION WORKLOAD	DESTINATION LOCALITY	TRANSFERRED (MB)	COST   
productpage-v1 	us-west1-b     	details-v1          	us-west1-c          	0.173250        	<$0.01	
productpage-v1 	us-west1-b     	reviews-v1          	us-west1-b          	0.058500        	-     	
productpage-v1 	us-west1-b     	reviews-v2          	us-west1-b          	0.056250        	-     	
productpage-v1 	us-west1-b     	reviews-v3          	us-west1-b          	0.058500        	-     	
reviews-v2     	us-west1-b     	ratings-v1          	us-west1-b          	0.056150        	-     	
reviews-v3     	us-west1-b     	ratings-v1          	us-west1-b          	0.058400        	-    

Cleanup

If you want to restart installation of the tool or don't want it in your cluster anymore, you can run:

istio-cost-analyzer destroy

You must set the --analyzerNamespace flag if you set it in the setup command.

You must also edit your Istio Operator config to remove the custom prometheus metrics. (you can use -o to do that here, but it's unstable)

  • add for latency:

istio-cost-analyzer's People

Contributors

adiprerepa avatar dependabot[bot] avatar pmerrison avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

istio-cost-analyzer's Issues

add azure support

Currently, we only support aws and gcp pricing (represented by --cloud). We have already generated the pricing structures for aws/gcp in /pricing, so adding azure shouldn't be that bad.

There might be some regex match stuff that needs to be added to the main cost analysis codepath, but really our locality matching logic is pretty cloud agnostic. Adding an option for azure should entail similar work within the main codepath as running the tool with --pricePath set to azure data.

Data rates: https://azure.microsoft.com/en-us/pricing/details/bandwidth/

Looks similar to GCP, looks like we can just modify the gcp_rate_converter.go script to match azure localities.

Document/Improve egress rate fetchers

Document the scripts in pricing/gcp and pricing/aws, both in comments and in directory-level READMEs. These scripts fetch/derive rate information.

They are also very linear in the way the code is written - there's no abstraction/nuance. This is probably fine for scripts, but it might be helpful to at least break these into functions.

Automatically edit Istio Operator config

We currently require the user to insert some custom prometheus config to their Operator config (in README.md).

We should take care of this automatically in the setup command, and let the user specify which Istio Operator they want to edit with two flags: --operatorName & --operatorNamespace.

Not work on GKE with Istio 1.18.2

Environment

  • Kubernetes 1.26.5
  • Istio 1.18.2
  • GKE

Set up istio-cost-analyzer on Istio 1.18.2 and deployed the Bookinfo sample applicaiton, kept requesting the productpage service, but got no result:

$ istio-cost-analyzer analyze
found cloud: GCP
using pricing file: https://raw.githubusercontent.com/tetratelabs/istio-cost-analyzer/master/pricing/gcp/gcp_pricing.json
Waiting for prometheus to be ready...
Prometheus is ready! (Code: 200)
EFNcalculating egress costs for 0 call links

Total: -

SOURCE SERVICE  SOURCE LOCALITY COST

Add unit tests to `istio-cost-analyzer`

We have no tests. Should add them for:

  • cost.go -> calculating cost on different call scenarios
  • kube.go -> node logic & helper functions
  • call.go -> print table logic

docs for operator setup/destroy

automatic operator config setup/destruction was recently added, this needs to be documented both in README.md and in comments.

add namespace column to cost output

since we support analysis across multiple namespace, there might be service name conflicts. so, we must add a namespace column to further describe each entry.

should be as simple as taking the labels destination_namespace and namespace from the prom metric.

add CI for test/release

we should push the mutating webhook images to docker every release and add automatic tests/lint on PR creation.

Create injector for `destination_pod` labeling

Instead of needing to add the destination_pod label to all the deployments and to the operator (as seen in the readme), we should create an injector to automatically provide this configuration.

Use Kubernetes-type tables instead of current

Currently, we use tables that look like this:

Total: <$0.01

  SOURCE WORKLOAD | SOURCE LOCALITY |  COST   
------------------+-----------------+---------
  productpage-v1  | us-west1-b      | <$0.01  
  reviews-v2      | us-west1-b      | -       
  reviews-v3      | us-west1-b      | -   

We should use tables that look like the kubectl output, like (from kubectl get pods):

NAME                            READY   STATUS    RESTARTS   AGE
details-v1-777558dcb7-d89wt     2/2     Running   0          17d
productpage-v1-54d9bb5c-rmvg7   2/2     Running   0          17d
ratings-v1-5d95594b9f-8kjxh     2/2     Running   0          17d
reviews-v1-84c6f45fd5-dw974     2/2     Running   0          17d
reviews-v2-f5d4f955f-j9wft      2/2     Running   0          17d
reviews-v3-589dbbf77-2ljgm      2/2     Running   0          17d

Start using logrus/better logging

We kind of just scatter fmt.Printf or log.Printf everywhere, we should probably use something like logrus so we can utilize scopes.

We should have a normal output mode (no flags) and a verbose output mode (-v). The normal output mode is the baseline, while verbose mode should have debug info.

Normal:

Waiting for prometheus to be ready...
Prometheus is ready! (Code: 200)
Total: -

SOURCE WORKLOAD	SOURCE LOCALITY	DESTINATION WORKLOAD	DESTINATION LOCALITY	TRANSFERRED (MB)	COST 
productpage-v1 	us-west1-c     	details-v1          	us-west1-c          	1.070870        	-   	
productpage-v1 	us-west1-b     	reviews-v1          	us-west1-b          	0.356600        	-   	
productpage-v1 	us-west1-b     	reviews-v2          	us-west1-b          	0.355875        	-   	
productpage-v1 	us-west1-b     	reviews-v3          	us-west1-b          	0.356490        	-   

Verbose:

Waiting for prometheus to be ready...
Prometheus is ready! (Code: 200)
Total: -

us-west1-b(productpage-v1) -> us-west1-c(details-v1): 1070870  |  link 0 / 4
<other debug runtime info>

SOURCE WORKLOAD	SOURCE LOCALITY	DESTINATION WORKLOAD	DESTINATION LOCALITY	TRANSFERRED (MB)	COST 
productpage-v1 	us-west1-c     	details-v1          	us-west1-c          	1.070870        	-   	
productpage-v1 	us-west1-b     	reviews-v1          	us-west1-b          	0.356600        	-   	
productpage-v1 	us-west1-b     	reviews-v2          	us-west1-b          	0.355875        	-   	
productpage-v1 	us-west1-b     	reviews-v3          	us-west1-b          	0.356490        	-   

The goal is that users can run the tool with -v when they have issues and can possibly get insight into them, maybe making it part of the repository issue template.

count gateways in cost traffic

the easy way to do this is to just add the gateway option to the operator configOverride, but there's some upstream/downstream_peer semantics to be worked out. Right now, cost doesn't really account for gateway traffic at all.

We also count both inboundSidecar and outboundSidecar, when we should probably only count one.

Support analysis across multiple namespaces

Currently, we only support analysis across a single namespace (--namespace). We should allow this argument to take in a list, and label all pods/deployments in those namespaces.

As long as istio-injection=enabled is labeled on these namespaces, Prometheus should identify workloads in these namespaces and include them in the metrics.

We would probably want to separate the output by Workload/Locality/Namespace pairings instead of just Workload/Locality, as we do now. This way, all the data can exist in the same table, uniformly.

Note: for updating pods, we use SharedInformer, which only takes one namespace. I think this is being worked on (?) in kubernetes/kubernetes#74415, but until this gets implemented, we should keep an index of informers by namespace, and run them all on separate goroutines. Tedious, but necessary,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.