fairwindsops / polaris Goto Github PK

View Code? Open in Web Editor NEW

3.1K 53.0 210.0 53.8 MB

Validation of best practices in your Kubernetes clusters

Home Page: https://www.fairwinds.com/polaris

License: Apache License 2.0

Dockerfile 0.28% Go 92.94% CSS 2.98% JavaScript 0.62% Shell 3.17%

kubernetes audit best-practices cluster dashboard fairwinds-official hacktoberfest

polaris's Issues

Only two namespaces inspected

I am trying Polaris in a cluster with 11 namespaces, but only kube-syste, and polaris are shown in the report.

Any clue?

Thanks

Allow rule exceptions for particular resources/namespaces

There are some containers/pods/controllers that need to have access to a feature that might typically be disallowed by a Polaris configuration. For instance, a particular container might need runAsNonRoot=false to function properly.

In these cases, we could disable that particular check for that particular resource, e.g. by using an annotation on the resource. This would cut down on noise in the report, and allow every team to strive for a score of 100.

Of course, we'll want to discourage folks from adding exceptions when they're unnecessary just to bypass Polaris, so this will take some thinking in terms of UX.

CI/CD uses branch of reactiveops/charts rather than master

If a fix is merged into the master branch, but not the polaris-latest branch, it won't make it into polaris releases.

Create favicon

Would David have an asset here? Can also just crop the existing logo

Changelog Updates?

The new releases have changelogs listed and appears to be on version 0.1.3 now, but https://github.com/reactiveops/polaris/blob/master/CHANGELOG.md has not been updated with those changes per release.

Get deployment pod names populated

Currently pod names are left blank. Looks like it's not super trivial to get these plumbed through.

Expose cluster name?

If you run polaris in many clusters, without creating an ingress (custom dns per each cluster), it would be nice if the header area also showed the cluster name (can be read from kubeconfig?)

Thoughts?

Design updated configuration schema that will support all of v1

An update configuration schema should be designed that will support all of the existing validations along with all planned validations before v1, including:

Pull policy always (warning)
No host networking
No host port
No host IPC
No host pid
Restricting kernel capabilities by default like SYS_ADMIN (warning)
No privileged containers (warning)
No root user (warning)
Should use read only root filesystem (warning)
Don't mount /var/run/docker.sock

As is likely obvious here, this will also need to find some kind of way to differentiate from errors and warnings.

Add Validation for Image Pattern Matching

We should have a list of patterns that an image attribute must match one of, or a list of patterns that an image attribute should not match any of. Generally these would never be set at the same time, but I can't think of any reason we need to add any logic to ensure that they're never both set. I think the most sensible approach here is to ensure that container.image matches at least one pattern defined in the whitelist or does not match any patterns defined in the blacklist. We do not want to do full regex matching here, but supporting a * wildcard character would simplify things. Ideally we could create a small method that determined if a blacklist or whitelist item matched a string, and then write a bunch of unit tests to cover use cases like:

quay.io/reactiveops*

quay.io/reactiveops (invalid)
quay.io/reactiveops/ (valid)
quay.io/unreactiveops (invalid)
quay.io/reactiveops/rbac-manager (valid)

quay.io/*ops*

quay.io/reactiveops (invalid)
quay.io/reactiveops/ (valid)
quay.io/unreactiveops (valid)
quay.io/reactiveops/rbac-manager (valid)

quay.io/reactiveops/rbac-manager

quay.io/reactiveops (invalid)
quay.io/reactiveops/ (invalid)
quay.io/unreactiveops (invalid)
quay.io/reactiveops/rbac-manager (valid)

Custom checks?

Is it possible to add custom checks?

Add Validation For images.pullPolicyNotAlways

This should verify that container.pullPolicy is not always. A variety of similar validations are available in the codebase.

Search and filter namespace

Hello,

I think it would be helpful to build a search input field to filter out one or more specific namespaces with statistics, when you only want to focus on these.

CrashLoopBackOff when running Polaris Webhook in kind cluster

Repro Steps

Install kind
Spin up a local, 2-node cluster: kind create cluster --config=config.yaml && export KUBECONFIG=~/.kube/kind-config-kind
Deploy Polaris Webhook: kubectl apply -f https://raw.githubusercontent.com/reactiveops/polaris/master/deploy/webhook.yaml
Describe Pod: kubectl -n polaris describe pod polaris

Expected Behavior

Webhook pod should be in a good state and running.

Actual Behavior

Pod goes into CrashLoopBackOff.

Additional Info

Screenshot

Kind Cluster Configuration

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
  - role: control-plane
  - role: worker

static content is not loaded when using nginx-ingress with custom base path

We would like to get polaris dashboard accessible from subpath of our domain (e.g. https://example.com/polaris)

Issue is static content is not loaded as deployment expects them without polaris path.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: example.com-polaris
  namespace: polaris
  annotations:
    kubernetes.io/ingress.provider: "nginx"
    kubernetes.io/ingress.class: "nginx-ks"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/secure-backends: "false"
    nginx.ingress.kubernetes.io/rewrite-target: "/$1"
    nginx.ingress.kubernetes.io/configuration-snippet: |
      rewrite ^(/polaris)$ $1/ permanent;
spec:
  tls:
    - hosts:
        - example.com
      secretName: star-example-com
  rules:
    - host: cluster.notifai.io
      http:
        paths:
          - path: /polaris/?(.*)
            backend:
              serviceName: polaris-dashboard
              servicePort: 8080

Provide clear path for adding additional validations

Ideally this should include both some basic documentation and potentially cleaning up code to make the process simpler.

Deploying new version results in the errors due to improper clusterrole

so i deployed a new version, 0.2 of polaris and it resulted in the below error.. i was able to solve by changing the clusterrole permissions for polaris-dashboard. below is the error for the reference.

time="2019-06-21T21:12:50Z" level=info msg="Starting Polaris dashboard server on port 8080" time="2019-06-21T21:13:19Z" level=error msg="Error fetching Nodes nodes is forbidden: User \"system:serviceaccount:polaris:polaris-dashboard\" cannot list resource \"nodes\" in API group \"\" at the cluster scope" time="2019-06-21T21:13:19Z" level=error msg="Error fetching Kubernetes resources nodes is forbidden: User \"system:serviceaccount:polaris:polaris-dashboard\" cannot list resource \"nodes\" in API group \"\" at the cluster scope"

Readiness probes should be ignored on initContainers

The Kubernetes documentation states that readiness probes are not supported on initContainers:

https://kubernetes.io/docs/concepts/workloads/pods/init-containers/

Also, Init Containers do not support readiness probes because they must run to completion before the Pod can be ready.

Based on this, I think that lack of a readiness probe should be ignored for initContainers. This would bring my cluster's score up a bit. 😃

Include default config with distributed binaries

When running the binary release, it will break when there isn't a config.yaml file present. We should find a way to include default config directly in the binary distribution.

Example configs don't contain displayName

The example configs don't contain an example of setting the display name in yaml configs.

Also --display-name is missing from the README.md docs for flags

Terraform Module

Would you accept a PR that added Polaris as a Terraform module?

LimitRange is ignored for cpu/mem limit/range report.

We are using LimitRange for some namespaces. But Polaris still shows "CPU requests should be set" warning.

apiVersion: v1
kind: LimitRange
metadata:
  name: limits
spec:
  limits:
  - defaultRequest:
      cpu: 50m
      memory: 256Mi
    default:
      cpu: 200m
      memory: 256Mi
    type: Container

Error creating Kubernetes client/resources when using OpenID Connect as identity provider

After polaris --dashboard command I am getting:

Error creating Kubernetes client No Auth Provider found for name
Error creating Kubernetes resources No Auth Provider found for name

PS. I am using dexter - a tool for creating and authenticating kubectl users via Google's OpenID Connect.

Issue with runAsNonRoot

Not entirely sure that your check for runAsNonRoot is working, or we mis-understand exactly what it's checking. We have a pod running which is set at the pod level as below, yet reactive is still saying it shouldn't run as root... which it isn't. Since the securityContext at the container level is only for overriding what is set at the pod level I hope that being set at the pod level is enough. Any ideas?

securityContext:
runAsNonRoot: true
runAsUser: 5000

Add Validation for Security Capabilities

The idea here is to be able to whitelist or blacklist securityContext.capabilities. This is going to be a bit more difficult to implement as we'll have to have our own concept of the defaults set here as I don't think it will actually show up in the spec here. Alternatively it might be more straightforward to have separate whitelist and blacklist support for capabilities that have been added and dropped. This one will take some time to figure out, and likely will involve some changes to the config syntax.

Data structures and types are overly repetitive, this should be refactored

Ideally we should be able to use the same base set of data structures for both the webhook and dashboard, that's not the case yet. It would be good to consolidate and simplify these types as a starting point.

Not able to find sample values.yaml

I prefer having values.yaml, i could not find one. If there is any, can you guys point me to one.

TIA

Add validation for simple security checks

This involves 4 bits of configuration:

security.runAsPriviliged: This container check should fail if container.securityContext.privileged is true
security.runAsRootAllowed: This pod and container check should fail if pod.securityContext.runAsNonRoot or container.securityContext.runAsNonRoot is not true
security.notReadOnlyRootFileSystem: This container check should fail if container.securityContext.readOnlyRootFileSystem is not true
security.privilegeEscalationAllowed: This container check should fail if container.securityContext.allowPrivilegeEscalation is true

If I don't get to it first, this will also involve some changes to the existing config package Security struct.

how often does data refresh?

launched Polaris in my cluster, found some problems with my deployments. Fixed a few things, re-deployed the offending applications... No change to the dashboard (even after an hour or two).

If I delete the Polaris pod, it'll restart and shows the expected changed errors/warnings.

How often does polaris scan the cluster? How do I trigger a new scan?

Adjust potential os.exit(1) for resource creation

There is a call to config.GetConfigOrDie() in resource.go:123. This attempts to load the config, but if unsuccessful will call an os.Exit(1) from inside resourse.go.

From a user experience perspective it may be better to leverage config.GetConfig() and handle the err != nil by returning an empty ResourceProvider and an error message. Where this function is invoked, already has error handling which would capture a more usable message to the user instead of calling an os.Exit(1).

Add testing for webhook using KIND

Is your feature request related to a problem? Please describe.
The webhook is the most potentially disruptive piece of code here, and is undertested. The reason for this is that it's difficult to unit test. That's a big part of why it's marked as experimental.

Describe the solution you'd like
My suggestion is that we write some end-to-end tests that add the webhook to a KIND cluster and make sure resources are accepted and rejected appropriately.

Describe alternatives you've considered
It would be nice if we could unit test as well...I wonder how well it plays with the fake k8s API we have in fixtures.go

results.json endpoint should set content-type header

The dashboard results.json output is not sending an application/json content-type. It's current set to plain text. It would be nice to have this header set so that viewers in the browser detect json.

This is a trivial and nice-to-have enhancement.

Add ability to run dashboard using an audit stored as JSON/YAML

Is your feature request related to a problem? Please describe.

Describe the solution you'd like
We'd like to be able to save an audit as JSON/YAML, and run the dashboard using that file

Describe alternatives you've considered

Additional context

Add ReactiveOps branding to dashboard

We should add:

ReactiveOps logo
Copyright notice
Link back to reactiveops.com
(maybe) a short description of what we do

The copyright can be dropped into a footer. For the logo and link, curious to hear what people think. A few options I can think of:

In the header, put "by ReactiveOps" or similar next to the FW logo
At the bottom of the page, put "Made with ☕️ by ReactiveOps" or similar, a la GitHub gists

For reference, Sonobuoy Scanner is referred to everywhere as "Heptio Sonobuoy Scanner", they put their logo at the bottom, along with a link saying "Made with love by Heptio".

CONTRIBUTING doc is missing instructions about packr

It seems like there should be some info about packr in the CONTRIBUTING doc so that developers know how to build the project locally for dashboard development.

Add checks to prevent host networking

Values that should be validated here include:

.spec.hostAliases
.spec.hostIPC
.spec.hostNetwork
.spec.hostPID
.spec.ports.*.hostPort

Helm Deployment for Dashboard uses webhook values override

The deployment template for the dashboard references the values image for webhook (.Values.webhook.image.repository) but it should be the dashboard one - .Values.dashboard.image.repository)

Remove Gopkg.lock and vendor folder

This ones are not necessary to store in the repo. It makes repo larger and longer to download.
Contributor can restore dependencies with dep util instead.

Dashboard webpage doesnt work in offline environment

The Dashboard web page will not work in an offline environment because it tries to pull in javascript and css files from the internet. By default we prevent accessing the internet and are generally opposed to allowing accessing. These javascript and CSS files should be included in the docker image.

Change output format?

I'm noticing that we have lots of logic to create ResultSummaries - basically iterating over an array of ResultMessages to count the number of successes/errors/warnings.

I think it'd simplify things to have a ResultSet type:

type ResultSet struct {
    Errors      []ResultMessage
    Warnings    []ResultMessage
    Successes   []ResultMessage
}

Then rather than aggregating totals, we can just use len(results.Errors). It'll also make it easier for downstream consumers to segregate errors, warnings and successes...right now the list of ResultMessages seems to implicitly sort with error > warning > succcess.

Any concerns with this approach? If not I'm happy to implement.

Running polaris locally complains about dashboard.gohtml

Steps to reproduce:

Install polaris on my local machine with brew
Download the default config-full.yaml from the root of this repo, change the name to config.yaml and make sure I'm in the same directory
Auth to my k8s cluster
Run polaris --dashboard. Log output

INFO[0009] Starting Polaris dashboard server on port 8080

Navigate to localhost:8080

Expected behaviour:
I see some sort of dashboard

Actual behaviour:
Console logs show

INFO[0021] Error getting template data stat /Users/<my username>/Code/polaris/templates/dashboard.gohtml: no such file or directory

Browser shows Error getting template data

Fix issues with linting

Use helm template to generate manifests

I think we should use the helm chart to generate the manifests that we keep in the deploy folder. We can keep different ones if we want using multiple values files.

As part of this, we should have a CI check that makes sure the manifests match the current iteration of the chart, as well as linting and testing the chart.

The question then becomes, once we go public, does the chart live in reactiveops/charts, or do we keep it here? Or do we keep it here and sync it to charts?

flags --set-exit-code-on-error --set-exit-code-below-score are not available in 0.2.1 when installed thorough brew

Was trying out 0.2.1 installed through Homebrew as suggested on the Readme. Wanted to try failing polaris in CI/CD when there are any error-level issues hence followed the command with flags advised.

polaris --audit --audit-path ./deploy/ \
  --set-exit-code-on-error \
  --set-exit-code-below-score 90

However got the error flag provided but not defined: -set-exit-code-on-error.

$ polaris --version
Polaris version 0.2.1

$ polaris --help
Usage of polaris:polaris
  -audit
    	Runs a one-time audit.
  -audit-path string
    	If specified, audits one or more YAML files instead of a cluster
  -config string
    	Location of Polaris configuration file
  -dashboard
    	Runs the webserver for Polaris dashboard.
  -dashboard-base-path string
    	Path on which the dashboard is served (default "/")
  -dashboard-port int
    	Port for the dashboard webserver (default 8080)
  -disable-webhook-config-installer
    	disable the installer in the webhook server, so it won't install webhook configuration resources during bootstrapping
  -display-name string
    	An optional identifier for the audit
  -kubeconfig string
    	Paths to a kubeconfig. Only required if out-of-cluster.
  -log-level string
    	Logrus log level (default "info")
  -master string
    	The address of the Kubernetes API server. Overrides any value in kubeconfig. Only required if out-of-cluster.
  -output-file string
    	Destination file for audit results
  -output-format string
    	Output format for results - json, yaml, or score (default "json")
  -output-url string
    	Destination URL to send audit results
  -version
    	Prints the version of Polaris
  -webhook
    	Runs the webhook webserver.
  -webhook-port int
    	Port for the webhook webserver (default 9876)

Did I miss anything?

Current state of frontend code is a mess, it should be cleaned up

Validate more resources than deployments

As referenced in our roadmap, we want to add support for a variety of additional resources, starting with those that act as parent resources such as pods, including:

StatefulSet
DaemonSet
Job
CronJob
ReplicaSet

Template error for docs

Going here http://localhost:8080/details/security yields this error:

template: check-details:18:33: executing "head" at <.JSON>: can't evaluate field JSON in type *dashboard.TemplateData

Version 0.1.3.

Currently, polaris reports a warning of cpu/memory limits and requests missing on initContainers but per https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resources I don't think that setting limits/requests on initContainers is standard / best practice?

Correct me if wrong!