Coder Social home page Coder Social logo

anchore / stereoscope Goto Github PK

View Code? Open in Web Editor NEW
72.0 13.0 43.0 877 KB

go library for processing container images and simulating a squash filesystem

License: Apache License 2.0

Go 97.84% Makefile 0.93% Dockerfile 0.49% Shell 0.74%
container-image container go golang squashfs hacktoberfest

stereoscope's Introduction

stereoscope

Go Report Card GitHub go.mod Go version License: Apache-2.0 Slack Invite

A library for working with container image contents, layer file trees, and squashed file trees.

Getting Started

See examples/basic.go

docker image save centos:8 -o centos.tar
go run examples/basic.go ./centos.tar

Note: To run tests you will need skopeo installed.

Overview

This library provides the means to:

  • parse and read images from multiple sources, supporting:
    • docker V2 schema images from the docker daemon, podman, or archive
    • OCI images from disk, directory, or registry
    • singularity formatted image files
  • build a file tree representing each layer blob
  • create a squashed file tree representation for each layer
  • search one or more file trees for selected paths
  • catalog file metadata in all layers
  • query the underlying image tar for content (file content within a layer)

stereoscope's People

Contributors

5p2o5pe25out avatar ajvpot avatar amar-babu avatar bradleyjones avatar cpendery avatar dependabot[bot] avatar dtrudg avatar errordeveloper avatar fengshunli avatar ferada avatar iaklis avatar jonasagx avatar jonathongardner avatar jonjohnsonjr avatar kushalbeniwal avatar kzantow avatar luhring avatar shanedell avatar spiffcs avatar testwill avatar tri-adam avatar vaikas avatar wagoodman avatar westonsteimel avatar willmurphyscode avatar wobito avatar xdavidwu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stereoscope's Issues

File tree hard links

Ensure that the filetree operations work as expected for hardlinked files in each layer.

SquashFS Iteration Fails when FIFO Present

What happened:

When iterating through a SquashFS layer within a SIF that contains a FIFO, an error is encountered.

could not read image: failed to walk layer="sha256:14489b4001eb9184a44622395748011affff4f358134049e566e026ab380dc37": could not add path="/foo" link="" during squashfs iteration

What you expected to happen:

I expected SquashFS iteration to succeed, even when a FIFO is present.

How to reproduce it (as minimally and precisely as possible):

Generate a SquashFS partition that contains a fifo:

$ mknod foo p
$ mksquashfs ./foo rootfs.sq
...

Use siftool (available here) to create a SIF containing the SquashFS partition:

$ siftool new fifo.sif
$ siftool add --datatype 4 --partarch 1 --partfs 1 --parttype 2 fifo.sif rootfs.sq

Process with stereoscope:

$ go run examples/basic.go singularity:fifo.sif
DEBU[0000] image: source=Singularity location=fifo.sif  
DEBU[0000] image metadata: digest=sha256:affd044aad62d0250e63e4bf1dcc8f385fe956525c79d6cea8c7dfb3eee96c47 mediaType=application/vnd.sylabs.sif.layer.v1.sif tags=[] 
DEBU[0000] layer metadata: index=0 digest=sha256:14489b4001eb9184a44622395748011affff4f358134049e566e026ab380dc37 mediaType=application/vnd.sylabs.sif.layer.v1.squashfs 
panic: could not read image: failed to walk layer="sha256:14489b4001eb9184a44622395748011affff4f358134049e566e026ab380dc37": could not add path="/foo" link="" during squashfs iteration

goroutine 1 [running]:
main.main()
        /home/adam/src/stereoscope/examples/basic.go:32 +0x73b
exit status 2

Anything else we need to know?:

Environment:

$ cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Cannot accept image ID as input

Currently cannot provide a Docker image ID as input (example with grype):

go run main.go 34ca5da54570

[ERROR] failed to catalog: could not fetch image '34ca5da54570': unable to trace image save progress: unable to inspect image: Error: No such image: index.docker.io/library/34ca5da54570:latest
exit status 1
%                                                    

Opaque directories not merged correctly during image squash

Stereoscope is mishandling situations where opaque directories in upper layers are merged down into lower layers, particularly when the directory doesn't exist in the lower layer.

There are two noteworthy problems here:

  1. The opaque directory is not added to the lower layer at the end of the merge (and it should be).
  2. Because of the above problem, opaque directories are added implicitly during the merge in such a way that prevents these directories from being correctly added to the image's FileCatalog, on which consumers depend. Consumers that try to access such directories in a squashed image receive the error "could not find file". This implicit addition of directories during merge operations should be removed.

Tasks:

  • add tests for this case
  • add the directory from the upper tree even when it contains .wh..wh..opq during merge
  • don't allow for implicit addition of parents during AddPath during merge
  • validate that when we create a tree, all its entries are found in the FileCatalog

Bulk file retrieval

Should be able to provide an image.Image a list of file References and efficiently extract the contents, irregardless of the target layer.

Support for image indexes with multiple manifests

What would you like to be added:

Currently stereoscope will throw an error when using image providers when they are passed an image index that contains references to multiple manifests. I'd like to request that this behaviour is changed (or a new feature is implemented) to support image indexes with references to multiple manifests.

Why is this needed:

Multi manifest image indexes are becoming increasingly more common, especially with the prominence of multiple supported architectures for containers (primarily amd64 and arm64). If stereoscope is passed one of these multi manifest indexes, it errors out with no ability to filter or select one of the manifests from the index (at least with the oci-dir provider).

My primary use-case for this change is to be able to use Syft to generate SBOMs for each manifest within a multi-arch oci-dir. While I'm unsure exactly how the SBOMs will be generated for multi-arch images, the changes I am requesting here seem to be a prerequisite to be able to support any type of multi-arch SBOM (or generating SBOMs per image manifest) when given a multi-manifest index.

Additional context:

I have already prototyped the code for supporting this and plan to open a PR but thought I would open an issue before doing so. My main concerns about making this change is to what extent is changing or breaking the existing API allowed.

My current implementation adds a new Index struct, a new IndexProvider interface with a ProvideIndex method so providers can optionally add support for multi-manifest indexes, however this feels very awkward for two reasons.

The first is providers will still error out if you call Provide with a multi-manifest index, this is due to the existing API not being changed at all and Provide still only supporting a single image.

Secondly, there would now another entire code path to support multi-manifest indexes. While this code path does still fully support single-manifest indexes (or just single images if the format doesn't have the concept of multiple manifests), it does mean users of this library will need to implement support for a different API and new users may be confused about using the Index vs non-index functions.

Panic on images with zero layers

Stereoscope panics when returning results for an image that has no layers (e.g. scratch). This can be seen by running the following from stereoscope's root directory:

skopeo --override-os linux copy docker://anchore/test_images:scratch_v1 oci:./test-scratch
go run ./examples/basic.go oci-dir:./test-scratch

The cause of this panic is here: https://github.com/anchore/stereoscope/blob/main/pkg/image/image.go#L200

This code assumes there is at least one layer in the image and thus doesn't check to make sure len(i.Layers)-1 produces a valid index.

NOTE: If you try to use Docker Engine to save the image you'll get a totally different error, as seen here:

go run ./examples/basic.go anchore/test_images:scratch_v1
panic: unable to save image tar: Error response from daemon: empty export - not implemented

Add OCI image support

Currently only supports docker images. The github.com/google/go-containerregistry dependency was specifically selected due to the OCI v1 interface abstraction, allowing for multiple providers of container image formats that mapped to OCI v1 features. Since all stereoscope features depend on this abstraction (and are not hard coded to any specific format) it should be a relatively easy lift to implement OCI support from multiple sources (directory, tar archives, etc).

NewTarIndex race condition when run in a goroutine

What happened:

In our build system we pull+cache images using crane in an OCI layout format. We then attempt to SBOM these images which may share layer tarballs.

Concurrently running Image.Read results in the following error being returned:

l.indexedContent, err = file.NewTarIndex(
	tarFilePath,
	layerTarIndexer(tree, l.fileCatalog, &l.Metadata.Size, l, monitor),
)
if err != nil {
	return fmt.Errorf("failed to read layer=%q tar : %w", l.Metadata.Digest, err)
}
failed to read
 layer="sha256:994393dc58e7931862558d06e46aa2bb17487044f670f310dffe1d24e4d1eec7" tar : unexpected EOF

I believe this is caused by stereoscope re-using file handles in os.Open resulting in a race condition of reads.

What you expected to happen:

Image SBOMing is go-routine safe w/ images that share layers.

How to reproduce it (as minimally and precisely as possible):

In a system w/ low IO speed (Github's default runner works):

  1. Pull multiple images that share layers across them and store as OCI layout (images w/ alpine base work)
  2. Attempt to go-routine image SBOMing

Anything else we need to know?:

Full stereoscope + syft logs in our CI run: https://github.com/defenseunicorns/zarf/actions/runs/5485435230/jobs/9994240328?pr=1887

Environment:

  • Github default runner ubuntu-latest

error on filetree merge

What happened: While building a new container image for testing, the following error message is displayed (running syft):

[0004] ERROR filetree merge failed to remove path (path=/usr/local/share/ca-certificates): %!w(*errors.errorString=&{unable to remove node: /usr/local/share/ca-certificates}) from-lib=steroscope

As part of creating the container, we are removing the ca-certificates package, perhaps we shouldn't try to do that? It doesn't matter for what is intended in the image

Environment:

  • OS (e.g: cat /etc/os-release or similar): OSX, using syft on the tip of main

Add release process

Currently stereoscope is unversioned (with the exception of the first tag manually performed). Moving forward we will be committing to tagged releases of stereoscope periodically. This issue represents the work to add the workflows, process docs, and helper scripts to facilitate releases.

SquashFS iteration fails with "unexpected EOF"

What happened:

When iterating through a SquashFS layer within particular SIF images, an error is encountered. The image described in anchore/syft#1150 is one example of this.

What you expected to happen:

Expected SquashFS iteration to succeed with well-formed SIF images.

How to reproduce it (as minimally and precisely as possible):

Install SingularityCE and build a SIF:

singularity build OneAPI.sif docker://intel/oneapi-hpckit:latest

Process with stereoscope:

$ go run examples/basic.go singularity:OneAPI.sif 
DEBU[0000] image: source=Singularity location=OneAPI.sif 
DEBU[0015] image metadata: digest=sha256:2f58e7c87a19782f3f459bd8db1393fb37740a38e78957228200a753d31de312 mediaType=application/vnd.sylabs.sif.layer.v1.sif tags=[] 
DEBU[0015] layer metadata: index=0 digest=sha256:11cc80431c3b72354f0a2c551508592b874579be91ab23a75343037f6a44f93e mediaType=application/vnd.sylabs.sif.layer.v1.squashfs 
panic: could not read image: failed to walk layer="sha256:11cc80431c3b72354f0a2c551508592b874579be91ab23a75343037f6a44f93e": open opt/intel/oneapi/compiler/2022.1.0/linux/lib/oclfpga/host/linux64/bin/perl/lib/5.30.3/pod/perlpod.pod: unexpected EOF

goroutine 1 [running]:
main.main()
        /home/adam/src/stereoscope/examples/basic.go:32 +0x73b
exit status 2

Anything else we need to know?:

The error occurs with or without the patch from #139 applied, so I believe this issue is distinct from #138.

Environment:

$ cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ singularity --version
singularity-ce version 3.10.2

filepath Windows ==> Linux cleaning bug

What happened:

[0001] ERROR failed to cleanup: %!w(*multierror.Error=&{[0xc00022bb90 0xc00022bc20] <nil>}) from-lib=stereoscope
1 error occurred:
        * failed to determine image source: could not fetch image 'ubuntu:latest': 
        * could not read image: failed to read layer="sha256:da55b45d310bb8096103c29ff01038a6d6af74e14e3b67d1cd488c3ab03f5f0d" tar : 
        * failed to visit tar entry="etc/.pwd.lock" : 
        * failed visitor on tar indexEntry: 
        * unable to find parent path="\\etc" 
        * while adding path="/etc/.pwd.lock"

What you expected to happen:
Parent Path should be /etc but is \etc

How to reproduce it (as minimally and precisely as possible):
Checkout and run the windows support branch on syft on a windows machine.

go run main.go ubuntu:latest

Source of bug

// Normalize returns the cleaned file path representation (trimmed of spaces and resolve relative notations)
func (p Path) Normalize() Path {
trimmed := strings.Trim(string(p), " ")
if trimmed == "/" {
return Path(trimmed)
}
return Path(filepath.Clean(strings.TrimRight(trimmed, DirSeparator)))
}

filepath.Clean returns with the os.Separator for windows while parsing a file system from a linux based container

Anything else we need to know?:

Environment:

  • OS (e.g: cat /etc/os-release or similar):\
  • windows

Relevant PR:
anchore/syft#548

Improve test coverage to >= 80%

Once coverage is at a good threshold, add a quality gate to the pipeline to prevent regression of coverage below a threshold.

Image Read Hook

What would you like to be added:
I hope the Image Read() can provide a hook for skipping read the layer what needn't read again. For example:

func (i *Image) Read(hook func(layer *Layer) bool) error {
...
	for idx, v1Layer := range v1Layers {
		layer := NewLayer(v1Layer)
		if hook(v1Layer) {
			continue
		}
		err := layer.Read(&i.FileCatalog, i.Metadata, idx, i.contentCacheDir)
		if err != nil {
			return err
		}
		i.Metadata.Size += layer.Metadata.Size
		layers = append(layers, layer)

		readProg.N++
	}
...
}

I hope this hook can be exposed to the func 'GetImage()'

Why is this needed:
With the overlay of the layer, base layer's digest is always same. In order to avoid repetitive waste of resources.

Additional context:

Odd error message when docker.socket in home dir is used

What happened:
In later versions of docker desktop the default docker socket appears to be in ~/.docker/run instead of at /var/run/docker.sock. (Docker desktop settings > Advanced > uncheck enable "default docker socket"). Even when the symlink from /var/run/docker.sock to the alternative location is missing the docker CLI works fine, however syft fails with:

$ syft docker:alpine:latest

[0000]  WARN scheme "docker" specified, but it coincides with a common image name; re-examining user input "docker:alpine:latest" without scheme parsing because image retrieval using scheme parsing was unsuccessful: unable to use DockerDaemon source: unable to inspect existing image: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? from-lib=syft
error: 1 error occurred:
	* unable to analyze image: unable to make new source: could not fetch image "alpine:latest": unable to determine image source to select platform

What you expected to happen:

  • The error message should be clear that you specified docker but the docker daemon was not accessible... we shouldn't try to keep going any further down this path. (This might not be possible lexically, so this is a soft ask).
  • The unknown source type isn't in the setPlatform switch case which is why this fails, could we make this pass this switch and fail on the pull?
  • Syft should have worked by falling back to the home dir socket! What mechanisms are in place for the docker CLI where this works for them but not us? (note: DOCKER_HOST is not at play here).

Note: The behavior is described from syft's point of view, but this is a stereoscope concern.

Cleanup temp directories across image providers

What happened:
See docker/sbom-cli-plugin#22 for the original problem; invocations of docker sbom (which uses the docker daemon provider) are not cleaning up temp files.

What you expected to happen:
All temp directories should be cleaned up.

How to reproduce it (as minimally and precisely as possible):
docker sbom alpine:latest and ls the temp directory to see that there are sbom-cli-plugin-* directories still left (docker sbom uses stereoscope).

Anything else we need to know?:
Today the docker daemon provider uses the tarball provider, each create sibling temp directories under a common root directory. We should be cleaning up the image.tar as soon as we create and provide an image from the tarball provider (from within the docker daemon provider --downside is that multiple calls to provide() will fail then [which hints that another solution may be needed here] ).

Stereoscope shouldn't generate its own OCI manifests

What happened:

Stereoscope generates OCI manifests for images from the Docker "tarball provider", for which stereoscope doesn't have access to OCI manifests. See: https://github.com/anchore/stereoscope/blob/main/pkg/image/docker/tarball_provider.go#L69

What you expected to happen:

Stereoscope should not attempt to generate OCI manifests — it should capture and surface an image's OCI manifest if it already exists, or it should accept that there is no OCI manifest available.

Generated manifests (and the resulting manifest digests) are non-authoritative, and they don't fulfill all of a user's expectations for consuming an OCI manifest, such as using the manifest's digest to identify and retrieve OCI images. It's not clear what value generated OCI manifests add to users, and we're finding that they can even be confusing and problematic for users (see anchore/grype#435).

How to reproduce it (as minimally and precisely as possible):

I've created an example that calls stereoscope's stereoscope.GetImage function and prints out the manifest data:

https://github.com/luhring/stereoscope/blob/show-manifest-info-for-image-from-docker/examples/manifest_info.go

Steps:

  1. Ensure that the Docker daemon is available on your local machine.
  2. Check out luhring/stereoscope to the branch show-manifest-info-for-image-from-docker.
  3. Run go run ./examples/manifest_info.go <image>, for an image reference like ubuntu:latest.
  4. Try to use the reported manifest digest to find or retrieve the image you just analyzed. (E.g. the digest I get is sha256:aac1b1ac3ff329b251d567fba305a8212d1159a706ce038f24f0adc2b996680f.)
  5. Observe that no image can be found for this digest.

Symlink not being recognized

What happened:

I'm opening the node:14 docker image which has a symlink at /usr/bin/X11 but its not marked as type symlink:

root@53845b2997a0:/app# stat /usr/bin/X11
  File: /usr/bin/X11 -> .
  Size: 1               Blocks: 0          IO Block: 4096   symbolic link
Device: 100064h/1048676d        Inode: 6157740     Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2017-05-03 09:38:16.000000000 +0000
Modify: 2017-05-03 09:38:16.000000000 +0000
Change: 2023-01-31 04:54:29.376134370 +0000
 Birth: -
❯ ./stereoscope-reproducation docker:node:14
opening image docker:node:14
Saw the following types:
  TypeRegular (1511 times)
  TypeDirectory (627 times)
error: max allowable directory traversal depth reached (maybe a link cycle?)

(this also happens for at least /usr/bin/2to3-2.7, and I assume others)

What you expected to happen:

For the file to be of type symlink.

I've confirmed that symlinks are being detected in other images, e.g.

❯ ./stereoscope-reproducation docker:postgres:12
opening image docker:postgres:12
skipping link /etc/alternatives/awk.1.gz
skipping link /etc/alternatives/builtins.7.gz
skipping link /etc/alternatives/nawk.1.gz
skipping link /etc/alternatives/pager.1.gz
skipping link /etc/alternatives/rmt.8.gz
skipping link /usr/share/doc/libxml2/NEWS.gz
skipping link /usr/share/doc/perl/Changes.gz
skipping link /usr/share/doc/postgresql-12/README.Debian.gz
Saw the following types:
  TypeRegular (10885 times)
  TypeDirectory (1485 times)

How to reproduce it (as minimally and precisely as possible):

`main.go`
package main

import (
		"context"
		"fmt"
		"github.com/anchore/stereoscope"
		"github.com/anchore/stereoscope/pkg/file"
		"github.com/anchore/stereoscope/pkg/filetree"
		"github.com/anchore/stereoscope/pkg/filetree/filenode"
		"os"
)

func readImage(imagePath string) int {
		// context for network requests
		ctx, cancel := context.WithCancel(context.Background())
		defer cancel()

		var err error

		fmt.Printf("opening image %s\n", imagePath)
		img, err := stereoscope.GetImage(ctx, imagePath)

		if err != nil {
				return 1
		}

		// note: we are writing out temp files which should be cleaned up after you're done with the image object
		defer img.Cleanup()

		seenTypes := make(map[file.Type]int)

		err = img.SquashedTree().Walk(
				func(path file.Path, f filenode.FileNode) error {
						if _, ok := seenTypes[f.FileType]; !ok {
								seenTypes[f.FileType] = 0
						}

						seenTypes[f.FileType]++

						if f.IsLink() {
								fmt.Printf("path: %s\n", path)
						}
						return nil
				},
				&filetree.WalkConditions{
						ShouldVisit: func(path file.Path, node filenode.FileNode) bool {
								return !node.IsLink()
						},
						ShouldContinueBranch: func(path file.Path, node filenode.FileNode) bool {
								if node.IsLink() {
										fmt.Printf("skipping link %s\n", path)
										return false
								}

								return true
						},
				},
		)

		fmt.Printf("Saw the following types:\n")
		if count, ok := seenTypes[file.TypeRegular]; ok {
				fmt.Printf("  TypeRegular (%d times)\n", count)
		}
		if count, ok := seenTypes[file.TypeHardLink]; ok {
				fmt.Printf("  TypeHardLink (%d times)\n", count)
		}
		if count, ok := seenTypes[file.TypeSymLink]; ok {
				fmt.Printf("  TypeSymLink (%d times)\n", count)
		}
		if count, ok := seenTypes[file.TypeCharacterDevice]; ok {
				fmt.Printf("  TypeCharacterDevice (%d times)\n", count)
		}
		if count, ok := seenTypes[file.TypeBlockDevice]; ok {
				fmt.Printf("  TypeBlockDevice (%d times)\n", count)
		}
		if count, ok := seenTypes[file.TypeDirectory]; ok {
				fmt.Printf("  TypeDirectory (%d times)\n", count)
		}
		if count, ok := seenTypes[file.TypeFIFO]; ok {
				fmt.Printf("  TypeFIFO (%d times)\n", count)
		}
		if count, ok := seenTypes[file.TypeSocket]; ok {
				fmt.Printf("  TypeSocket (%d times)\n", count)
		}
		if count, ok := seenTypes[file.TypeIrregular]; ok {
				fmt.Printf("  TypeIrregular (%d times)\n", count)
		}

		if err != nil {
				fmt.Printf("error: %v", err)

				return 1
		}

		return 0
}

func main() {
		os.Exit(readImage(os.Args[1]))
}
`go.mod`
module github.com/g-rath/stereoscope-reproducation

go 1.19

require github.com/anchore/stereoscope v0.0.0-20230222185948-fab1c9638abc

require (
	github.com/Microsoft/go-winio v0.5.2 // indirect
	github.com/anchore/go-logger v0.0.0-20220728155337-03b66a5207d8 // indirect
	github.com/becheran/wildmatch-go v1.0.0 // indirect
	github.com/bmatcuk/doublestar/v4 v4.0.2 // indirect
	github.com/containerd/containerd v1.6.18 // indirect
	github.com/containerd/stargz-snapshotter/estargz v0.10.0 // indirect
	github.com/docker/cli v20.10.12+incompatible // indirect
	github.com/docker/distribution v2.8.0+incompatible // indirect
	github.com/docker/docker v20.10.12+incompatible // indirect
	github.com/docker/docker-credential-helpers v0.6.4 // indirect
	github.com/docker/go-connections v0.4.0 // indirect
	github.com/docker/go-units v0.4.0 // indirect
	github.com/gabriel-vasile/mimetype v1.4.0 // indirect
	github.com/gogo/protobuf v1.3.2 // indirect
	github.com/golang/protobuf v1.5.2 // indirect
	github.com/google/go-containerregistry v0.7.0 // indirect
	github.com/google/uuid v1.3.0 // indirect
	github.com/hashicorp/errwrap v1.1.0 // indirect
	github.com/hashicorp/go-multierror v1.1.1 // indirect
	github.com/klauspost/compress v1.15.9 // indirect
	github.com/mitchellh/go-homedir v1.1.0 // indirect
	github.com/opencontainers/go-digest v1.0.0 // indirect
	github.com/opencontainers/image-spec v1.0.3-0.20211202183452-c5a74bcca799 // indirect
	github.com/pelletier/go-toml v1.9.5 // indirect
	github.com/pierrec/lz4/v4 v4.1.15 // indirect
	github.com/pkg/errors v0.9.1 // indirect
	github.com/scylladb/go-set v1.0.3-0.20200225121959-cc7b2070d91e // indirect
	github.com/sirupsen/logrus v1.8.1 // indirect
	github.com/spf13/afero v1.6.0 // indirect
	github.com/sylabs/sif/v2 v2.8.1 // indirect
	github.com/sylabs/squashfs v0.6.1 // indirect
	github.com/therootcompany/xz v1.0.1 // indirect
	github.com/ulikunitz/xz v0.5.10 // indirect
	github.com/vbatts/tar-split v0.11.2 // indirect
	github.com/wagoodman/go-partybus v0.0.0-20200526224238-eb215533f07d // indirect
	github.com/wagoodman/go-progress v0.0.0-20200621122631-1a2120f0695a // indirect
	golang.org/x/crypto v0.0.0-20220315160706-3147a52a75dd // indirect
	golang.org/x/net v0.7.0 // indirect
	golang.org/x/sync v0.0.0-20210220032951-036812b2e83c // indirect
	golang.org/x/sys v0.5.0 // indirect
	golang.org/x/text v0.7.0 // indirect
	google.golang.org/genproto v0.0.0-20220502173005-c8bf987b8c21 // indirect
	google.golang.org/grpc v1.47.0 // indirect
	google.golang.org/protobuf v1.28.0 // indirect
)

Anything else we need to know?:

I'm not a docker wiz or anything, so I should be using a pretty vanilla setup.

Environment:

  • OS (e.g: cat /etc/os-release or similar): Ubuntu 20.04 (via WSLv2)
NAME="Ubuntu"
VERSION="20.04.5 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.5 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Cycle during symlink resolution prevents syft/grype scans from enumerating vulnerabilities

What happened:
We recently came across an image where a user had accidentally created a symlink loop within their container image. This resulted in the image failing both syft and grype scans.

This could be used by an attacker or developer who wants to hide vulnerabilities through malicious compliance. By generating a symlink loop, syft/grype will error and fail to output results. If scan errors are not closely monitored the image could avoid detection.

What you expected to happen:
Malformed symlinks should be logged, but allow the rest of the syft or grype scans to complete.

How to reproduce it (as minimally and precisely as possible):
Using old, known vulnerable image from webgoat/webgoat-8.0:latest grype shows a bunch of vulns:

% grype webgoat/webgoat-8.0:latest
 ✔ Vulnerability DB                [no update available]
 ✔ Pulled image
 ✔ Loaded image                                                                                                                                                             webgoat/webgoat-8.0:latest
 ✔ Parsed image                                                                                                                sha256:6664051b8808540cf920e5802e7eb025f9cb19346dcb0fc2be137a26979bb111
 ✔ Cataloged packages              [315 packages]
 ✔ Scanned for vulnerabilities     [581 vulnerability matches]
   ├── by severity: 73 critical, 185 high, 152 medium, 60 low, 111 negligible
   └── by status:   304 fixed, 277 not-fixed, 0 ignored

Build a downstream image and create a symlink loop in a cataloged binary, /usr/bin/xz for example:

echo "FROM webgoat/webgoat-8.0:latest
USER root
RUN yes | ln -sfi /usr/bin/xzcat /usr/bin/xz
USER webgoat" | docker build -t symlink-loop:latest .

Syft error:

% syft symlink-loop:latest
 ✔ Loaded image                                                                                                                                                                    symlink-loop:latest
 ✔ Parsed image                                                                                                                sha256:108309a3bb9c201ffe0f2c4fc1300f434485e5216d6edad8c76d6d2ec8d3e7de
 ✔ Cataloged packages              [315 packages]
[0003]  WARN unable to create any package-file relationships cataloger=dpkgdb-cataloger error=unable to find path for path="/usr/bin/xz": cycle during symlink resolution package=xz-utils
[0003]  WARN unable to process mimetypes=[application/x-executable application/x-mach-binary application/x-elf application/x-sharedlib application/vnd.microsoft.portable-executable]: unable to get ref
[0005]  WARN error while cataloging cataloger=graalvm-native-image-cataloger
1 error occurred:
        * failed to find binaries by mime types: unable to get ref for path="/usr/bin/xz": cycle during symlink resolution

Grype error:

% grype symlink-loop:latest
 ✔ Vulnerability DB                [no update available]
 ✔ Loaded image                                                                                                                                                                    symlink-loop:latest
 ✔ Parsed image                                                                                                                sha256:108309a3bb9c201ffe0f2c4fc1300f434485e5216d6edad8c76d6d2ec8d3e7de
 ✔ Cataloged packages              [315 packages]
[0003]  WARN unable to create any package-file relationships cataloger=dpkgdb-cataloger error=unable to find path for path="/usr/bin/xz": cycle during symlink resolution package=xz-utils
[0003]  WARN unable to process mimetypes=[application/vnd.microsoft.portable-executable application/x-executable application/x-mach-binary application/x-elf application/x-sharedlib]: unable to get ref
[0004]  WARN error while cataloging cataloger=graalvm-native-image-cataloger
1 error occurred:
        * failed to catalog: 1 error occurred:
        * failed to find binaries by mime types: unable to get ref for path="/usr/bin/xz": cycle during symlink resolution

Anything else we need to know?:

This issue was validated on latest syft/grype versions:

% syft version
Application: syft
Version:    0.94.0
BuildDate:  2023-10-20T17:21:07Z
GitCommit:  8f6bdde6662aa8050a71eadbdb7bd5a3b079a56d
GitDescription: v0.94.0
Platform:   linux/amd64
GoVersion:  go1.21.3
Compiler:   gc

% grype version
Application:         grype
Version:             0.72.0
BuildDate:           2023-10-20T18:17:05Z
GitCommit:           04df28051b7694a5e4a28fc5b2ea2068f24ef213
GitDescription:      v0.72.0
Platform:            linux/amd64
GoVersion:           go1.21.3
Compiler:            gc
Syft Version:        v0.94.0
Supported DB Schema: 5

Environment:

  • OS (e.g: cat /etc/os-release or similar):
$ cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.9 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.9"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.9 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.9:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.9
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.9"

Cannot authenticate to registry without specifying a specific "authority" config value

Background: In order to ensure stereoscope isn't sending a given set of credentials to a container registry other than that which the user intends, stereoscope takes an Authority config value. This is intended to prevent, for example, sending the user's Docker Hub credentials to the Amazon ECR service. The way this works is that stereoscope compares the given registry host value (e.g. 111122223333.dkr.ecr.us-west-2.amazonaws.com) and ensures it matches the given credential set's Authority value before making use of the credentials.

The registry value is sometimes made explicit by the image string (e.g. 111122223333.dkr.ecr.us-west-2.amazonaws.com/my-web-app:latest) and sometimes left implicit (e.g. ubuntu:latest) in which case the registry value is determined automatically by the go-containerregistry library.

Problem: If the user needs to authenticate with a registry, and they don't provide the correct Authority config value that corresponds to the image string they've provided (which includes the case of not providing an Authority value at all), authentication will fail because stereoscope will not use the provided credentials.

Requirement: Stereoscope shouldn't attempt to compare the registry host value with the Authority value in the case where the Authority value is an empty string. Such a case indicates that the user is expecting the credentials they provide to be applied to the registry determined by inspecting their provided image string. For example, if they are specifying registry:registry.gitlab.com/my-org/my-project:latest, the user expects the provided credentials (e.g. a username and password pair) to be sent to the registry registry.gitlab.com...).

Notes to developer:

A change will be needed here: https://github.com/anchore/stereoscope/blob/main/pkg/image/registry_options.go#L27
As well as in downstream consumer config processing, e.g. here: https://github.com/anchore/grype/blob/main/internal/config/registry.go#L46

Goroutine Leak in Long-Running Services with Stereoscope

What happened:
I've noticed a goroutine leak when using stereoscope in a service that needs to run continuously. The issue seems to be with connections to the Docker daemon via the Unix socket not being released.

What you expected to happen:
After continuously communicating with the Docker daemon, Stereoscope should not leave long connections unreleased.

How to reproduce it (as minimally and precisely as possible):

Here's a minimal reproducible example:

package main

import (
    "context"
    "fmt"
    "net/http"
    _ "net/http/pprof"
    "os"
    "os/signal"
    "syscall"

    "github.com/anchore/stereoscope/internal/log"
    "github.com/anchore/stereoscope/pkg/image"

    "github.com/anchore/stereoscope"
)

func main() {
    go func() {
        http.ListenAndServe(":6060", nil)
    }()

    signalChan := make(chan os.Signal, 1)
    signal.Notify(signalChan, os.Interrupt, syscall.SIGTERM)

    go func() {
        for {
            ctx := context.TODO()

            ImageSource := image.DockerDaemonSource
            UserInput := "busybox:latest"
            img, err := stereoscope.GetImageFromSource(ctx, UserInput, ImageSource)
            if err != nil {
                panic(err)
            }
            cleanup := func() {
                if err := img.Cleanup(); err != nil {
                    log.Warnf("unable to cleanup image=%q: %w", UserInput, err)
                }
            }

            for _, layer := range img.Layers {
                fmt.Printf("layer: %s\n", layer.Metadata.Digest)
            }

            cleanup()

        }
    }()

    <-signalChan

}

Anything else we need to know?:

Environment:

OS: Linux kaze 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Docker: 20.10.21 # installed by `sudo apt install docker.io`
Stereoscope: v0.0.0-...-e14bc44 (Latest as of May 22, 2023)

Should attempt to pull docker images if possible

Currently we only utilize docker images that are local to the system, however, we should attempt to pull images that we cannot find locally. There may be issues with authentication with private registries and sharing docker daemon credentials --note: we should avoid storing user credentials and instead use the docker config via their SDK.

Implement an event bus for observable task progress

Presentation concerns and business functionality concerns should remain strictly separate. That being said, there should be a way to peer into business functionality to serve presentation. For instance, when saving a docker image from the docker daemon and storing to a temp file, there is currently no way to present progress to the user for this potentially time-expensive action. This task is worthy enough of progress and logging is insufficient (and should not be used for presentation purposes anyway)

Adding an event bus would allow business objects to report when there are notable events or post rich objects which can be subsequently polled for progress. These mechanisms should be passive in nature and still honor a strict separation of concerns:

  1. the business object should not know where events are being published to
  2. no presumption of consumer or producer rates by either side
  3. polling objects published should be generic and not interfere with the business object's task.

Back to the example use case: visibility into docker image save and image caching progress. The business object would push a "progress" object once at the start of a task where progress can be observed. Consumers would read this object and continually poll it at an interval of their choosing to report on progress. This would meet the above 1-3 criteria.

Ensure CircleCI config matches functional Docker version

Once pr #15 lands, the following section needs to be on-deck to remove when the breaking change goes away:

          # work around for recent circle CI breaking change (should remove asap)
          # Error: "Error response from daemon: client version 1.39 is too new. Maximum supported API version is 1.38"
          DOCKER_API_VERSION: "1.38"

Add podman support

  • Add podman image source, enabling podman:<image> parsing
  • Add podman image provider, enabling tarball extraction from the podman via it's varlink api

symlinks do not return content

stereoscope does not support producing contents from symlinks (currently returning an empty string).

This happens because the current implementation tries to read the file path itself instead of realizing it is a link. It can't immediately go to a link since it may not be present in the current layer.

Although this problem happens for a squashed layer read (all files present including links), the fix should also handle the situation of being in a layer and not having access to the target.

Possible approaches:

  • ResolvedReferenes is a slice of file.References, to all valid link destinations, and its up to the caller to determine the right one based on the request.
  • or -
  1. First: track the links as they are found: https://github.com/anchore/stereoscope/blob/master/pkg/image/layer.go#L48 and track them on a new attribute (links maybe?) on the Layer object https://github.com/anchore/stereoscope/blob/master/pkg/image/layer.go#L16

  2. Next: write a new reconcileLinks() function on Image that is invoked on Image.Read(), just at the end: https://github.com/anchore/stereoscope/blob/master/pkg/image/image.go#L72
    This reconcileLinks function should follow all Layer.links to the destination recursively until a normal file type is found... then this information (the file.Reference) can be captured for later reference on the FileCatalogEntry struct (https://github.com/anchore/stereoscope/blob/master/pkg/image/file_catalog.go#L14) ... maybe call it something like ResolvedReference. if the link doesn't reconsile (dead link) then this acts like any other file (no resolved reference) thus content requests would be empty --as they should)

  3. Lastly: when there are any "FileContent" requests, (https://github.com/anchore/stereoscope/blob/master/pkg/image/file_catalog.go#L75, and https://github.com/anchore/stereoscope/blob/master/pkg/image/file_catalog.go#L42) these functions should be updated to be able to look for the ResolvedReference field in the catalog to order the fetch request properly.

Data Race in Image.Read

What happened:

When running running (*Image).Read in parallel against different images, Go's race detector detects a data race:

WARNING: DATA RACE
Read at 0x000003523340 by goroutine 412:
  github.com/anchore/stereoscope/pkg/file.NewFileReference()
      /home/adam/src/stereoscope/pkg/file/reference.go:13 +0x384
  github.com/anchore/stereoscope/pkg/filetree.(*FileTree).AddFile()
      /home/adam/src/stereoscope/pkg/filetree/filetree.go:514 +0x374
  github.com/anchore/stereoscope/pkg/filetree.(*Builder).Add()
      /home/adam/src/stereoscope/pkg/filetree/builder.go:44 +0x14d
  github.com/anchore/stereoscope/pkg/image.layerTarIndexer.func1()
      /home/adam/src/stereoscope/pkg/image/layer.go:233 +0x62b
  github.com/anchore/stereoscope/pkg/file.NewTarIndex.func1()
      /home/adam/src/stereoscope/pkg/file/tar_index.go:47 +0x421
  github.com/anchore/stereoscope/pkg/file.IterateTar()
      /home/adam/src/stereoscope/pkg/file/tarutil.go:64 +0x338
  github.com/anchore/stereoscope/pkg/file.NewTarIndex()
      /home/adam/src/stereoscope/pkg/file/tar_index.go:55 +0x179
  github.com/anchore/stereoscope/pkg/image.(*Layer).Read()
      /home/adam/src/stereoscope/pkg/image/layer.go:114 +0xa5e
  github.com/anchore/stereoscope/pkg/image.(*Image).Read()
      /home/adam/src/stereoscope/pkg/image/image.go:218 +0xcf5
...

Previous write at 0x000003523340 by goroutine 411:
  github.com/anchore/stereoscope/pkg/file.NewFileReference()
      /home/adam/src/stereoscope/pkg/file/reference.go:13 +0x39c
  github.com/anchore/stereoscope/pkg/filetree.(*FileTree).AddFile()
      /home/adam/src/stereoscope/pkg/filetree/filetree.go:514 +0x374
  github.com/anchore/stereoscope/pkg/filetree.(*Builder).Add()
      /home/adam/src/stereoscope/pkg/filetree/builder.go:44 +0x14d
  github.com/anchore/stereoscope/pkg/image.layerTarIndexer.func1()
      /home/adam/src/stereoscope/pkg/image/layer.go:233 +0x62b
  github.com/anchore/stereoscope/pkg/file.NewTarIndex.func1()
      /home/adam/src/stereoscope/pkg/file/tar_index.go:47 +0x421
  github.com/anchore/stereoscope/pkg/file.IterateTar()
      /home/adam/src/stereoscope/pkg/file/tarutil.go:64 +0x338
  github.com/anchore/stereoscope/pkg/file.NewTarIndex()
      /home/adam/src/stereoscope/pkg/file/tar_index.go:55 +0x179
  github.com/anchore/stereoscope/pkg/image.(*Layer).Read()
      /home/adam/src/stereoscope/pkg/image/layer.go:114 +0xa5e
  github.com/anchore/stereoscope/pkg/image.(*Image).Read()
      /home/adam/src/stereoscope/pkg/image/image.go:218 +0xcf5
...

What you expected to happen:

I expect to be able to run multiple (*Image).Read in parallel safely.

How to reproduce it (as minimally and precisely as possible):

Call Image.Read from two Go unit tests marked with t.Parallel, and run with go test -race.

Anything else we need to know?:

I do have a fix that appears to address the issue. Will PR this shortly!

Environment:

  • OS (e.g: cat /etc/os-release or similar):
$ cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Platform selection in docker client has unnecessary error

What happened:

❯ go run examples/basic.go docker:busybox:1.31@sha256:91c15b1ba6f408a648be60f8c047ef79058f26fa640025f374281f31c8704387
[0000] DEBUG image: source=DockerDaemon location=busybox:1.31@sha256:91c15b1ba6f408a648be60f8c047ef79058f26fa640025f374281f31c8704387
[0000] DEBUG pulling docker image="busybox:1.31@sha256:91c15b1ba6f408a648be60f8c047ef79058f26fa640025f374281f31c8704387"
[0000] DEBUG using docker config="/Users/willmurphy/.docker/config.json"
[0000] DEBUG using docker credentials for "index.docker.io/v1/"
panic: unable to use DockerDaemon source: image has unexpected architecture "s390x", which differs from the user specified architecture "arm64"

goroutine 1 [running]:
main.main()
        /Users/willmurphy/work/stereoscope/examples/basic.go:38 +0x60c
exit status 2

This error affects syft:

❯ syft packages docker:busybox:1.31@sha256:91c15b1ba6f408a648be60f8c047ef79058f26fa640025f374281f31c8704387           
 ⠹ Pulling image           

2023/06/01 13:52:00 error during command execution: 1 error occurred:
        * failed to construct source from user input "docker:busybox:1.31@sha256:91c15b1ba6f408a648be60f8c047ef79058f26fa640025f374281f31c8704387": could not fetch image "busybox:1.31@sha256:91c15b1ba6f408a648be60f8c047ef79058f26fa640025f374281f31c8704387": scheme "docker" specified; image retrieval using scheme parsing (busybox:1.31@sha256:91c15b1ba6f408a648be60f8c047ef79058f26fa640025f374281f31c8704387) was unsuccessful: unable to use DockerDaemon source: image has unexpected architecture "s390x", which differs from the user specified architecture "arm64"; image retrieval without scheme parsing (docker:busybox:1.31@sha256:91c15b1ba6f408a648be60f8c047ef79058f26fa640025f374281f31c8704387) was unsuccessful: unable to determine image source to select platform

What you expected to happen:

Since I specified an exact digest, and didn't pass --platform, I didn't expect a validation error about the user requested platform.

How to reproduce it (as minimally and precisely as possible):

go run examples/basic.go docker:busybox:1.31@sha256:91c15b1ba6f408a648be60f8c047ef79058f26fa640025f374281f31c8704387

Anything else we need to know?:

I chose the digest based on running docker manifest inspect busybox:1.31 and choosing the digest of a platform that doesn't match my platform.

The registry provider doesn't have this error: go run examples/basic.go registry:busybox:1.31@sha256:91c15b1ba6f408a648be60f8c047ef79058f26fa640025f374281f31c8704387 works normally.

Environment:

  • OS (e.g: cat /etc/os-release or similar):
    M1 macbook pro, Darwin Kernel Version 22.4.0 arm64.
❯ docker version
Client:
 Cloud integration: v1.0.31
 Version:           23.0.5
 API version:       1.42
 Go version:        go1.19.8
 Git commit:        bc4487a
 Built:             Wed Apr 26 16:12:52 2023
 OS/Arch:           darwin/arm64
 Context:           default

Server: Docker Desktop 4.19.0 (106363)
 Engine:
  Version:          23.0.5
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.19.8
  Git commit:       94d3ad6
  Built:            Wed Apr 26 16:17:14 2023
  OS/Arch:          linux/arm64
  Experimental:     true
 containerd:
  Version:          1.6.20
  GitCommit:        2806fc1057397dbaeefbea0e4e17bddfbd388f38
 runc:
  Version:          1.1.5
  GitCommit:        v1.1.5-0-gf19387a
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Stereoscope pulls different images when using ``docker:`` vs ``registry:`` from multi-platform images

What happened:
On an arm machine, stereoscope produces different results when pulling a multi-platform image using docker scheme vs registry scheme.
The docker scheme correctly infers the host architecture and pulls the correct matching image whereas the registry scheme pulls only amd64/linux image when platform is not explicitly provided.

What you expected to happen:
The platform for both images should be the same, preferably the host platform.
If inference of platform is not intended in an OCI/registry source, this difference should be verbose in debug logs or in documentation.

How to reproduce it (as minimally and precisely as possible):
On an arm machine, registry scheme yields an x86 image (incorrectly)

; go run examples/basic.go registry:rust:1.42-slim-buster | grep binutils-x86
[0000] DEBUG image: source=OciRegistry location=rust:1.42-slim-buster
[0000] DEBUG pulling image info directly from registry image="rust:1.42-slim-buster"
[0000] DEBUG no registry credentials configured, using the default keychain
[0003] DEBUG image metadata: digest=sha256:165a37f9b454c1b000732bc6fb31edb6710818f6096796e7951a1e91cafa758d mediaType=application/vnd.docker.distribution.manifest.v2+json tags=[]
[0003] DEBUG layer metadata: index=0 digest=sha256:b60e5c3bcef2f42ec42648b3acf7baf6de1fa780ca16d9180f3b4a3f266fe7bc mediaType=application/vnd.docker.image.rootfs.diff.tar.gzip
[0005] DEBUG layer metadata: index=1 digest=sha256:508d8706fc1db20449e5e0fe22c65e1cce8ee30d062c5357f7e81f22cb8c3441 mediaType=application/vnd.docker.image.rootfs.diff.tar.gzip
 ...
    /usr/share/doc/binutils-x86-64-linux-gnu
 ...

On the same machine, docker scheme yields an arm image (correctly)

; go run examples/basic.go docker:rust:1.42-slim-buster | grep binutils-aarch64
[0000] DEBUG image: source=DockerDaemon location=rust:1.42-slim-buster
[0008] DEBUG image metadata: digest=sha256:8dc750c34fb3e46a4585e531416d995592fd7ca89630be76cca23ce1da26391f mediaType=application/vnd.docker.distribution.manifest.v2+json tags=[rust:1.42-slim-buster]
[0008] DEBUG layer metadata: index=0 digest=sha256:67d3a85d42a2b4ae3ca54ff7b6225bb136fbe0fa5732d9c5143470f13470bbde mediaType=application/vnd.docker.image.rootfs.diff.tar.gzip
[0008] DEBUG layer metadata: index=1 digest=sha256:081726014efb2e4d440405aee79d2cca3966c3fc965ce2cee13fd972c907e953 mediaType=application/vnd.docker.image.rootfs.diff.tar.gzip
    /usr/share/doc/binutils-aarch64-linux-gnu
...

Anything else we need to know?:
DaemonImageProvider infers host platform and pulls the correct image whereas
RegistryImageProvider uses go-containerregistry which defaults to amd64/linux when platform is missing rather than inferring the host platform.
See also:

Environment:

  • OS: MacOS Monterey, M1 Pro.

Embed fs.FileInfo in Metadata Struct?

What would you like to be added:

I'd like a way to obtain a (or ideally, preserve the original) fs.FileInfo associated with a given file.Metadata value. ModTime/AccessTime/ChangeTime were added directly to the Metadata struct in #166, which works, but that approach requires modifying the Metadata struct each time a new field is needed, and then wiring up the various sources that Stereoscope supports (TAR, Singularity/SquashFS, OS filesystem.)

I suspect it might be a better solution to simply embed an actual fs.FileInfo obtained from the source within the Metadata struct? This would allow anyone with a Metadata to get any metadata supported by the io/fs package, without requiring additional fields to be added to the Metadata struct each time. I believe it'd also allow removal of some of the source-specific handling.

Why is this needed:

I'm looking to get access to additional fields available from the source fs.FileInfo (ex. Xattrs).

Additional context:

I'd be willing to do the leg work on a PR for this, if the idea is sound. Thanks!

Add containerd support

In the same vein of supporting other container runtimes, it would be ideal to support pulling images from containerd directly.

Unable to read image using Docker daemon provider when architecture has variant

I was looking at anchore/grype#831, and it seems like this is ultimately a problem with how Stereoscope is determining "architecture" and "variant" values for a given image, particularly in the code path used for reading images using the DaemonImageProvider.

How to reproduce

With Stereoscope checked out locally, run:

go run ./examples/basic.go docker:ghcr.io/mattmoor/static@sha256:b7dcd21f108cfed6c394aa18240a26c02f904337a962ca0ffe17368de5c65a23

And you'll see:

DEBU[0000] image: source=DockerDaemon location=ghcr.io/mattmoor/static@sha256:b7dcd21f108cfed6c394aa18240a26c02f904337a962ca0ffe17368de5c65a23
panic: could not read image: unable to override metadata option: unknown architecture: arm/v7

...

Analysis

On this line, It looks like when the Docker daemon provider is attaching metadata to the image, it's setting the architecture to i.Architecture, and setting the variant to "". During the problem scenario, i.Architecture is set to arm/v7 — which means the variant ("v7") hasn't been separated out correctly. This causes the error to bubble up out of WithArchitecture because arm/v7 isn't in the known architectures list (but arm is).

So my first takeaway is: I think we shouldn't necessarily use "" as the variant in this code path.

But what's also interesting is that the source of the arm/v7 value is this call into the Docker client library. The return type (ImageInspect) has explicit fields for Architecture and Variant separately, so I'm not sure why it's not separating out the v7 into the variant for us so we don't have to.

Curious for your all's thoughts! This issue ends up being a showstopper for Syft and Grype users with Apple Silicon using Docker Desktop and images with variants. 🙏

Add more Reader-like interfaces to support downstream testing

There should be a FileTreeReader to aid in downstream testing, eliminating the need for users to populate a *tree.FileTree data structure for testing, since they only need read capabilities.

Also consider Readers for Image and Layer objects.

Stereoscope cannot inspect images in Docker Desktop

Hello!
I have created a custom Rabbit docker image and I wanted to check it for vulnerabilities using grype. The problem that I'm stuck now with is that grype does not identify my locally built image. For example, my custom image is named custom-rabbit, and when I call grype custom-rabbit I get the following errror:

grype custom-rabbit                                                             
 ✔ Vulnerability DB        [updated]
 ✔ Pulled image            
1 error occurred:
 * failed to catalog: could not fetch image "custom-rabbit": unable to use DockerDaemon source: pull failed: Error response from daemon: pull access denied for custom-rabbit, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

I have Docker running in the background, I have checked that it's up using systemctl status docker and also, my Docker Desktop app is running in the background. Yet, I haven't found a fix for it. Could you look up into this?

As with other images (ones from dockerhub) I have no problems with checking them, it's only the ones which are built locally.

Environment:

  • Output of grype version: 0.61.1
  • OS (e.g: cat /etc/os-release or similar): Ubuntu 22.04.2 LTS Jammy

Concurrently analyzing containers can lead to race condition causing error

What happened:
When using Grype as a library and scanning multiple containers in multiple goroutines, some of the goroutines may fail with an error like:

could not fetch image 'registry/image:tag': could not read image: failed to read layer="sha256:deadbeef" tar : open /tmp/stereoscope-cache516617993/sha256:deadbeef.tar: no such file or directory

What you expected to happen:
Multiple container images can be analyzed concurrently in the same go process.

How to reproduce it (as minimally and precisely as possible):

Scan multiple containers with grype pkg.Provide in the same process.

Working on a unit test that will reliably reproduce this.

Anything else we need to know?:
I think this issue is caused by the temp dir generator being in module scope instead of in the scope of the stereoscope image. file.Cleanup() is called at the end of each container analysis, which will delete ALL images on disk.

var tempDirGenerator = file.NewTempDirGenerator()

stereoscope/client.go

Lines 70 to 74 in 6e663af

func Cleanup() {
if err := tempDirGenerator.Cleanup(); err != nil {
log.Errorf("failed to cleanup: %w", err)
}
}

Environment:

  • MacOS
  • Debian

Add support for containers-storage backend for container images

What would you like to be added:
Currently syft & grype look for certain locations for finding the container images. It works fine if the container image is built either via docker or podman as the images they build reside in certain locations and those are part of the supported sources here.

Images built using buildah though reside in different location (usually containers-storage) and when scanning using syft, it scans the remote image instead of the locally built image.

Why is this needed:
Need to add support for scanning images built using non-daemonize tool like buildah. Add a new source where syft can look for when scanning image locally before checking OciRegistry.

Additional context:

Add docker daemon support

  • Add docker image source, enabling docker://<image> parsing
  • Add docker image provider, enabling tarball extraction from the docker daemon

Obtain registry credentials from common credential helpers

What would you like to be added:
As a tag onto #64, it would be ideal to support pulling directly from ECR, ACR, and GCR registries. Today there is support for providing username and password or token. However, there are several credential helpers that can be used such that the idioms of the each toolchain can be followed (e.g. as long as you have AWS_* environment variables the aws client would be able to get credentials for ECR on behalf of the user, not requiring the user to docker login manually).

This could include local resources too. (see anchore/syft#502)

Add file tree / catalog pattern search

Should be able to provide a path pattern (e.g. .*package.json$) or basename (package.json) and find all files in all layers that match.

There are few modes of search:

  • path match
  • path pattern match (stretch goal)
  • basename exact match
  • basename partial / pattern match

This is to support language-based analyzers.

Improve documentation

This library is the foundation of multiple container-based tools, documentation is critical here. There should be at least:

  • API documentation (pkg.go.dev/), which implies that in-line documentation for packages, structs, interfaces, and functions need to be added.
  • Higher-level concepts: an overview of what the library achieves, high level design, and basic interactions in a markdown file in this repo would be ideal.

Wrong source when ":" character in path

What happened:
When using image.DetectSource(path) with : in docker archive path UnknownSource is returned

What you expected to happen:
for DockerTarballSource to be returned (or just whatever source the file is if it exists)

How to reproduce it (as minimally and precisely as possible):

package main

import (
  "fmt"

  "github.com/anchore/stereoscope/pkg/image"
)

// run:
// "docker save alpine:3.14.1 > alpine:3.14.1.tar"
// "docker save alpine:3.14.1 > alpine-3.14.1.tar"

func main() {
  s1, l1, _ := image.DetectSource("alpine:3.14.1.tar")
  fmt.Printf("res1: %v-%v\n", s1, l1)

  s2, l2, _ := image.DetectSource("alpine-3.14.1.tar")
  fmt.Printf("res1: %v-%v\n", s2, l2)
}

Anything else we need to know?:
Relates to anchore/syft#927

Stereoscope Content API refactor

Today the API returns the contents of requests files in the form of a string. This causes memory pressures for large files. As a consumer of stereoscope, I need a means by which I can read nontrivial files without consuming large amounts of memory.

AC

  • File contents are not returned as strings.
  • File contents can be read by the consumer as needed.

Steps to test

  • Verify that files can be read from the API without needing to receive the entire file contents at once.
  • Try reading an unusually large file (e.g. many GBs). Answer: a) What's the impact on process memory? b) Is the process able to provide files whose sizes sum to a larger size than there is memory available?

Developer notes

The API should probably return an io.Reader for the requested files.

There may be multiple consumers for each file, which implies that unique readers are required for each caller.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.