Coder Social home page Coder Social logo

kolide / launcher Goto Github PK

View Code? Open in Web Editor NEW
495.0 23.0 96.0 13.21 MB

Osquery launcher, autoupdater, and packager

Home Page: https://kolide.com/launcher

License: Other

Go 88.62% Makefile 0.39% Shell 0.49% Objective-C 1.37% Dockerfile 0.15% PowerShell 0.05% C# 4.07% Augeas 4.81% C 0.04%
osquery host-instrumentation devops sysadmin grpc go-kit golang hacktoberfest

launcher's Introduction

The Kolide Agent

The Kolide Agent (aka: Launcher) is the lightweight agent designed to work with Kolide's service.

It's built around osquery, but has several additional capabilities:

  • secure automatic updates
  • many additional tables
  • device identification

Documentation

Most of the documentation for how our agent works can be found online at https://www.kolide.com/docs/using-kolide/agent.

There is some additional, mostly legacy, documentation on GitHub in the docs subdirectory of the repository.

launcher's People

Contributors

0xmachos avatar alrs avatar arush15june avatar bcoverston avatar blaedj avatar cwhits avatar dependabot[bot] avatar directionless avatar fritzx6 avatar goronfreeman avatar groob avatar iamharlie avatar jalseth avatar james-pickett avatar jessbellon avatar jnog avatar juneezee avatar loganmac avatar markvlk avatar marpaia avatar micah-kolide avatar murphybytes avatar nicktitle avatar rebeccamahany avatar securityclippy avatar synapsis2112 avatar terracatta avatar wstewartii avatar zackattack01 avatar zwass avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

launcher's Issues

autoupdate issues for MVP

The autoupdate feature for the launcher/osquery binaries has been added the the launcher, but several important items remain:

launcher/updater code:

  • complete #69, which adds tooling for updating the mirror with osqueryd versions based on the osquery.io releases
  • also add tooling which creates a tar.gz file for launcher and tags the release as stable/beta/specific-version, similar to #69
  • close #59, related to the two tasks above
  • troubleshoot and resolve #66, possibly related to lack of delegate target support in the autoupdate package.
  • add delegation support to launcher/autoupdate and unpin the updater dependency (depends on #66 getting fixed)
    version: 8ff71420c45f358475e14c5c8ddf340a1985c71f
  • Add delegation support to Launcher
  • Update package-builder to include initial TUF metadata and Notary URL

infrastructure

The notary service requires two mysql databases (server and signer) as well as the go dependencies. So far, I stood up all the required infrastructure, but using the bundled example keys.
The services and databases need to be re-launched with production level keys.

  • generate CA and client certificates for the notary server & signer.

These are the current configs that need to be replaced:
server-config:

	"trust_service": {
		"type": "remote",
		"hostname": "notarysigner",
		"port": "7899",
		"tls_ca_file": "/go/src/github.com/docker/notary/fixtures/root-ca.crt",
		"key_algorithm": "ecdsa",
		"tls_client_cert": "/go/src/github.com/docker/notary/fixtures/notary-server.crt",
		"tls_client_key": "/go/src/github.com/docker/notary/fixtures/notary-server.key"
	},

signer-config:

	"server": {
		"grpc_addr": "0.0.0.0:7899",
		"tls_cert_file": "/go/src/github.com/docker/notary/fixtures/notary-signer.crt",
		"tls_key_file": "/go/src/github.com/docker/notary/fixtures/notary-signer.key",
		"client_ca_file": "/go/src/github.com/docker/notary/fixtures/notary-server.crt"
	},

The important part is understanding the relationship between the two services so that keys are unique for server/signer. Another important and non-trivial issue is limiting access to the root keys and securely storing them -- a company wide 1password vault is not appropriate.

  • re-launch the mirror and dbs from scratch once unique keys are added. (relatively minor, but good hygiene).

  • add DNS entries for:

35.196.246.167 notary.kolide.co
35.196.246.167 notary.kolide.com

Documentation

  • write up a workflow for publishing new binary releases.
    Document a workflow for signing and publishing new launcher & osquery updates, such that more than one person on the team can create releases and launcher and osquery versions can be published to the mirror independently.

Testing

I tested that the launcher and osquery versions are both replaced/restarted when a new version of the binary is published to the website. It appears to work well, but there are still many possible edge cases that should be tested extensively, before we enable auto-update for a customer in production.
Some possible issues:

  • specifying a beta/other channel
  • what happens if you accidentally publish the metadata before updates are available on the mirror
  • validate that if the binary release is broken, the launcher/osquery can auto-recover with the previous version, instead of failing to start forever.
  • the ability to rotate TUF keys in production without affecting the customer

Investigate RocksDB Open Errors

Every now and then, when recovering or restarting an osquery instance, I see errors from osquery like this:

2017/07/03 15:10:27 status: {"s":"0","f":"rocksdb.cpp","i":"222","m":"Rocksdb open failed (5:0) IO error: lock \/var\/folders\/wp\/6fkmvjf11gv18tdprv4g2mk40000gn\/T\/E45653D23972376DF35B\/osquery.db\/LOCK: Resource temporarily unavailable","h":"B312055D-9209-5C89-9DDB-987299518FF7","c":"Mon Jul  3 21:10:25 2017 UTC","u":"1499116225"}

Investigate this, write a test which, figure out how to make it not happen, make the test pass.

Best practices table doesn't return any data in osqueryi

$ make osqueryi
mkdir -p build/darwin
mkdir -p build/linux
go build -i -o build/development-extension.ext ./cmd/development-extension/
osqueryi --extension=./build/development-extension.ext
Using a virtual database. Need help, type '.help'
osquery> .all kolide_email_addresses
+----------------+-----------+
| email          | domain    |
+----------------+-----------+
| [email protected] | kolide.co |
| [email protected] | arpaia.co |
+----------------+-----------+
osquery> .all kolide_best_practices
osquery> .schema kolide_best_practices
CREATE TABLE kolide_best_practices(`filevault_enabled` INTEGER, `firewall_enabled` INTEGER, `bluetooth_sharing_disabled` INTEGER, `sip_enabled` INTEGER, `screensaver_password_enabled` INTEGER, `screen_sharing_disabled` INTEGER, `file_sharing_disabled` INTEGER, `remote_login_disabled` INTEGER, `disc_sharing_disabled` INTEGER, `internet_sharing_disabled` INTEGER, `gatekeeper_enabled` INTEGER, `printer_sharing_disabled` INTEGER, `remote_management_disabled` INTEGER, `remote_apple_events_disabled` INTEGER);
osquery>

osquery worker doesn't restart plugin

I think this is a bug. From the logs:

W0825 12:48:55.290359 106020864 watcher.cpp:286] osqueryd worker (60012): Memory limits exceeded: 209854464
E0825 12:49:09.147480 3033838528 init.cpp:568] Cannot activate kolide_grpc config plugin: Unknown registry plugin: kolide_grpc

Looks like the osqueryd worker was killed because of a memory limit, but it came back up without the extension?
The launcher was running (launchd didn't restart it) but the activity stopped.

Maybe we could do some sort of select 1 query every x seconds? Or a custom healthz table?

Build osquery version manager

API driven osquery version manifest

There should be an initial request to a remote API endpoint when the agent starts/checks-in. Perhaps something like:

POST https://api.kolide.com/v1/osquery/versions

{
  "platform": "darwin",
  "version": "10.12.2",
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiYWRtaW4iOnRydWV9.TJVA95OrM7E2cBab30RMHrHDcEfxjoYZgeFONFh7HgQ"
}

The server may response with a sort of version manifest of the current state of the world for osquery.

{
  "stable": "1.4.0",
  "latest": "1.4.2",
  "versions": {
    "1.4.0": {
      "remote_url": "https://omg.gcp.com/1.4.0.tar.gz",
      "sha256": "651de3316fa27e74d6a9de619ebac68e3b781e9527fdd00c5ef7143b1fa581b6"
    },
    "1.4.2": {
      "remote_url": "https://omg.gcp.com/1.4.2.tar.gz",
      "sha256": "651de3316fa27e74d6a9de619ebac68e3b781e9527fdd00c5ef7143b1fa581b6"
    }
  }
}

Registry of binaries

In order to create an osquery version management utility, we need a remote directory of various versions of osqueryd binaries. We can probably pull these from the official osquery s3 bucket. It's possible that we can just hard link to binaries in Amazon, but for logs and availability, perhaps we want to host these ourselves.

Version resolution and download

The agent should resolve the data received from the remote version manifest against some sort of internal data structure that represents what IT'S state of the world is. Given this interaction (as well as an understand of the user's desired actions based on env/cli), the agent should have enough information to download the binaries if necessary and determine the binary path that should be running.

Isolated API

This component will be used by an osquery execution manager to determine the local path of the osqueryd binary which is desired. The tool should cache osqueryd binaries (to a configurable degree) and offer an internal API that allows another package to get the current osqueryd binary path. This API must account for the fact that the binary may or may not currently exist on disk (due to network issues, etc).

Add google analytics

We can use the google analytics code to report the current os that osql is deployed to. IMO this would be a great way to get wide reporting on osquery use and would be simple to add.
Thoughts?

Bootstrapping TUF without validation is a potential vulnerability

https://github.com/kolide/launcher/blob/master/autoupdate/generate_tuf.go#L50

We are populating the initial local TUF repository for Launcher by downloading it directly from the Notary Server. This exposes launcher to a number vulnerabilities because none of the customary client validations are performed on this data. It is just downloaded over the internets and assumed to be valid.

I would propose instead, that a user obtain a validated local copy of the TUF repository using Notary client and bootstrap the Launcher TUF repository from the local file system. Notary Client performs all TUF validations when it creates the local repository so this copy would be safe to use to bootstrap the Launcher copy of the repository. Note this 'gold' copy of the TUF repository could be done once and then distributed with Launcher. Even if this repository is stale, it will be iteratively made current if autoupdate is enabled when Launcher is run.

autoupdate: updates in a loop with latest updater/tuf version

I implemented #34, but with the latest version of the updater package, the autoupdate triggers at every startup, even if the metadata is not modified.

This could be because I am not yet handling delegation, although the metadata I work with doesn't have any delegates.

It could also be a bug in one of the recent commits.

For now I pinned the vendored updater to

- package: github.com/kolide/updater
  version: 8ff71420c45f358475e14c5c8ddf340a1985c71f

which appears to be working as intended.

Creating this issue so I can track the task, without blocking #34.

Create packaging tooling

  • Adapt mac-pkg-builder to accept server hostname, install LaunchDaemon, etc.
  • Write tool similar to mac-pkg-builder, but for deb's and rpm's.
  • Write automation to create packages for each of the first 1000 tenants and upload them to GCS
  • Document the format of the GCS bucket where the packages are stored

Build osquery execution manager

API

This API should expect a path to a local osqueryd binary as well as a set of data structures representing the desired osquery configuration. This component is then responsible for launching osqueryd as a subprocess, ensuring watchdog limits are sanely set, making sure the extension is working correctly, etc. The execution manager is supremely responsible for the lifecycle of the osqueryd process.

Process manager

This component should account for someone killing the osqueryd processes, etc. The osquery watchdog will enforce performance requirements, so there is no need to directly manage that, although compensating controls never hurt anybody.

Extension management

Since The Agent itself is an osquery extension, this component is responsible for initializing the internal osquery plugins and attaching them appropriately to the launched osqueryd instance.

Implement osqueryd health check

Perhaps by issuing a periodic select 1 to ensure that osqueryd is still responding to messages over the extension socket. If this fails, we could initiate a restart of osqueryd or the entire launcher.

use correct go-kit log log.Caller

When creating a new go-kit logger, with a log.DefaultCaller, the logger will be created with a value of Caller(3), which prints the current line number.

Now when we wrap the logger with level.NewFilter(logger) the caller breaks, and uses level.go:63 no matter what the log line is.

{"build":"0811d14e15eeee61618b146e1fc1defdc90086a5","caller":"level.go:63","level":"info","msg":"started kolide launcher","ts":"2017-08-04T13:25:30.832074161Z","version":"0811d14-dirty"}

Initializing the logger with log.Caller(5) resolves the issue, but _only for leveled logs.

We can either decorate the logger with a caller after the filters are configured, or use log.Caller(5) and enforce that all logs are leveled.
A possible solution would be to add a log helper to kolide/kit which enforces a logger with a specific level?

Invalid cache causes CI to fail

Every time a build is kicked off on CI, it fails with the following error:

#!/bin/bash -eo pipefail
make test
go test -cover -v github.com/kolide/launcher/cmd/launcher github.com/kolide/launcher/cmd/mac-pkg-builder github.com/kolide/launcher/cmd/osquery-extension github.com/kolide/launcher/osquery
# github.com/kolide/launcher/osquery
osquery/best_practices_test.go:6:2: cannot find package "github.com/stretchr/testify/require" in any of:
	/go/src/github.com/kolide/launcher/vendor/github.com/stretchr/testify/require (vendor tree)
	/usr/local/go/src/github.com/stretchr/testify/require (from $GOROOT)
	/go/src/github.com/stretchr/testify/require (from $GOPATH)
FAIL	github.com/kolide/launcher/osquery [setup failed]
?   	github.com/kolide/launcher/cmd/launcher	[no test files]
?   	github.com/kolide/launcher/cmd/mac-pkg-builder	[no test files]
?   	github.com/kolide/launcher/cmd/osquery-extension	[no test files]
Makefile:23: recipe for target 'test' failed
make: *** [test] Error 1
Exited with code 2

Re-running the build without cache installs testify correctly and the tests will then run appropriately. For some reason, cache is not getting written with the correct set of dependencies and this is causing the tests to fail.

Have agent efficiently collect and report certain metrics

In kolide/cloud app today we are building features that display potentially rapidly changing information for a paticular host where we only care about the last reported state (bad example: wifi_survey for geolocation)

With the existing osquery TLS API we have the following options which come with various pros and cons:

Scheduled/Logged diff query with fast interval.

Pros: Efficient information communication via diffs

Cons: If host goes offline and comes back, osqueryd will mass-post catch-up diffs buffered in RocksDB to kolide that we don't care about which is a massive waste of server throughput and a massive waste of DB write throughput.

Distributed query

Pros: When machine is not online the query is not continuously running and when the machine comes online it does not spam the server with extranious info.

Cons: Requires us to set the distributed query check interval to a lower value then we may like. Sends results payload everytime even if the data has not changed.

Possible User Story

As a developer of kolide/cloud it would be great if we could somehow schedule diff queries that only run when the agent is actively connected to the kolide server and do not run or log when that connection is severed.

Set and enforce a max database size

Now that we're caching logs locally in BoltDB, we need to ensure that there is a reasonable (configurable?) maximum database size so that even in variable network conditions, we ensure we never risk taking up too much local disk.

Error opening extension socket in tests

this test fails half the time on CI

=== RUN   TestRestart
2017/07/06 14:12:43 init: osqueryd
--- FAIL: TestRestart (6.60s)
	Error Trace:	runtime_test.go:140
	Error:		Received unexpected error dial unix /tmp/AFAA5AC4391420F24E7A/osquery.sock: connect: no such file or directory
			opening socket transport
			github.com/kolide/launcher/vendor/github.com/kolide/osquery-go.NewClient
				/go/src/github.com/kolide/launcher/vendor/github.com/kolide/osquery-go/client.go:30
			github.com/kolide/launcher/vendor/github.com/kolide/osquery-go.NewExtensionManagerServer
				/go/src/github.com/kolide/launcher/vendor/github.com/kolide/osquery-go/server.go:93
			github.com/kolide/launcher/osquery.LaunchOsqueryInstance
				github.com/kolide/launcher/osquery/_test/_obj_test/runtime.go:293
			github.com/kolide/launcher/osquery.(*OsqueryInstance).relaunchAndReplace
				github.com/kolide/launcher/osquery/_test/_obj_test/runtime.go:486
			github.com/kolide/launcher/osquery.(*OsqueryInstance).Restart
				github.com/kolide/launcher/osquery/_test/_obj_test/runtime.go:429
			github.com/kolide/launcher/osquery.TestRestart
				/go/src/github.com/kolide/launcher/osquery/runtime_test.go:140
			testing.tRunner
				/usr/local/go/src/testing/testing.go:754
			runtime.goexit
				/usr/local/go/src/runtime/asm_amd64.s:2337
			could not create extension manager server at /tmp/AFAA5AC4391420F24E7A/osquery.sock
			github.com/kolide/launcher/osquery.LaunchOsqueryInstance
				github.com/kolide/launcher/osquery/_test/_obj_test/runtime.go:300
			github.com/kolide/launcher/osquery.(*OsqueryInstance).relaunchAndReplace
				github.com/kolide/launcher/osquery/_test/_obj_test/runtime.go:486
			github.com/kolide/launcher/osquery.(*OsqueryInstance).Restart
				github.com/kolide/launcher/osquery/_test/_obj_test/runtime.go:429
			github.com/kolide/launcher/osquery.TestRestart
				/go/src/github.com/kolide/launcher/osquery/runtime_test.go:140
			testing.tRunner
				/usr/local/go/src/testing/testing.go:754
			runtime.goexit
				/usr/local/go/src/runtime/asm_amd64.s:2337
			could not launch new osquery instance
			github.com/kolide/launcher/osquery.(*OsqueryInstance).relaunchAndReplace
				github.com/kolide/launcher/osquery/_test/_obj_test/runtime.go:489
			github.com/kolide/launcher/osquery.(*OsqueryInstance).Restart
				github.com/kolide/launcher/osquery/_test/_obj_test/runtime.go:429
			github.com/kolide/launcher/osquery.TestRestart
				/go/src/github.com/kolide/launcher/osquery/runtime_test.go:140
			testing.tRunner
				/usr/local/go/src/testing/testing.go:754
			runtime.goexit
				/usr/local/go/src/runtime/asm_amd64.s:2337
			could not relaunch osquery instance
			github.com/kolide/launcher/osquery.(*OsqueryInstance).Restart
				github.com/kolide/launcher/osquery/_test/_obj_test/runtime.go:431
			github.com/kolide/launcher/osquery.TestRestart
				/go/src/github.com/kolide/launcher/osquery/runtime_test.go:140
			testing.tRunner
				/usr/local/go/src/testing/testing.go:754
			runtime.goexit
				/usr/local/go/src/runtime/asm_amd64.s:2337
		

FAIL
coverage: 66.0% of statements
FAIL	github.com/kolide/launcher/osquery	19.366s
Makefile:54: recipe for target 'test' failed

Permissions changed on binaries for launcher

I believe that #93 changed the permissions on the binaries-for-launcher bucket in the kolide-website project.

Consider my gcloud authentication information:

$ gcloud config list
Your active configuration is: [default]

[compute]
zone = us-east1-c
[core]
account = [email protected]
disable_usage_reporting = True
project = kolide-website

Now, when I try to build a package, the osquery tar ball "has an invalid header":

$ ./build/package-builder make --hostname="localhost:5000" --enroll_secret="abcd"
could not generate packages: could not generate macOS package: could not fetch path to osqueryd binary: couldn't untar package: autoupdate: create gzip reader from /tmp/package-builder_cache398299770/kolide/osqueryd/darwin/osqueryd-stable.tar.gz: gzip: invalid header

Catting that file:

$ cat /tmp/package-builder_cache398299770/kolide/osqueryd/darwin/osqueryd-stable.tar.gz
<?xml version='1.0' encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Access denied.</Message><Details>Anonymous users does not have storage.objects.get access to object binaries-for-launcher/kolide/osqueryd/darwin/osqueryd-stable.tar.gz.</Details></Error>%

Before I go trampling permissions on that bucket, is this something that someone can explain changing recently?

macOS Packaging MVP

This is a master-task for the MVP of packaging on macOS.

Blockers

TODO

  • Update the gRPC hostname to be localhost:8082 when the app is being ran on http://localhost:5000
  • Add the --grpc_insecure flag to the launcher's LaunchDaemon template when the app is being ran on http://localhost:5000
  • Update the gRPC hostname to be master-grpc.cloud.kolide.net:443 when the app is being ran on https://master.cloud.kolide.net
  • Update the gRPC hostname to be 123.cloud.kolide.net:443 when the app is being ran on https://123.cloud.kolide.net
  • Write a subcommand to produce production packages given the production key
  • Update the gRPC hostname to be launcher.kolide.com:443 when the app is being ran on https://kolide.com
  • Add the ability to sign mac packages to package-builder
  • Shard launcher root directories by gRPC server hostname

Acceptance Tests

  • Use the setup guide to download and install a macOS package from the app running locally and verify that the host connects to the instance.
  • Use the setup guide to download and install a macOS package from the app running from a PR deployment and verify that the host connects to the instance.
  • Use the setup guide to download and install a macOS package from the app running on the master deployment and verify that the host connects to the instance.
  • Use the setup guide to download and install a macOS package from the app running on the production deployment and verify that the host connects to the instance.

Refactor kolide_best_practices to be aware of query context

Right now, every execution of select foo from kolide_best_practices; executes every best practices query. The implementation should be refactored to be aware of the supplied query context to ensure only the required columns are calculated and returned.

For example, running the following query:

SELECT sip_enabled FROM kolide_best_practices;

should be just a light proxy on top of:

SELECT enabled AS compliant FROM sip_config WHERE config_flag='sip'

Build packaging tooling

For each build of the agent, we will have to compile a binary for a platform, pull in the latest osqueryd binaries, bundle it all up into an operating system package that is ready to be modified with keying materials for customers. Fortunately, if we don't rely on CGO, we can build all operating system packages from a Mac using GOOS to compile binaries in different formats (assuming we have a working directory of pre-built osqueryd binaries for various platforms).

Supported formats

We should be able to build packages in the following format for our customers:

  • PKG
  • RPM
  • DEB
  • MSI

Kolide Best Practices Table

Master task to build out the kolide_best_practices table which is a single-row table with the following boolean columns on macOS:

  • automatic_updates_enabled
  • filevault_enabled
  • system_firewall_enabled
  • sip_configured
  • gatekeeper_enabled
  • password_required_from_screensaver

The current implementation of the table exists at osquery/best_practices.go.

Expose the autoupdate settings to package-builder

Similarly to how you can call package-builder make with --insecure and the resultant launcher command will be ran with --insecure, we need to expose setting (or disabling) the --autoupdate flag as well. Right now, since --autoupdate defaults to false, we need this flag to create packages which have autoupdate enabled, but in the future, someone might want to use this to disable autoupdate if it becomes the default.

Additionally, pending the completion of #155, the update channel should also be configurable (with a default of "stable").

package versions should auto-increment.

Right now we build a package like launcher-darwin-0.1.0-20-g402d86a.pkg which takes account of the git tag, but also uses the git sha.
Building a new package without incrementing the git tag would update the sha, but leave the numeric version the same. This is problematic, as package managers rely on incrementing numbers to decide if a copy in the repository is newer or older than the installed package.

Affects munki, apt and yum.

Two possible solutions to fix this:

  1. use a different version scheme for packages. maybe add time.Now().Unix() to the package version?
  2. always increment build tags before releasing a new production package.

launcher stops reporting to the server after a "Unavailable" grpc error

User reports that once the Unavailable desc = transport is closing error happens, the launcher stops communicating with the server.

There could be several issues (including server side) but it's important to first eliminate the launcher as a probable cause.
We need to verify if the error causes the grpc connection to be closed completely (not just the request failing) and either re-dial or fatal.

log for reference:

I0912 21:06:10.245666 253878272 distributed.cpp:138] Executing distributed query: kolide:populate:storage: SELECT b.label, m.device_alias, m.device, m.blocks, m.blocks_available, m.blocks_size, m.path, m.type as filesystem, d.encrypted, d.type as encryption_type, d.uid as encryption_uid, d.user_uuid as encryption_user_uuid FROM mounts m LEFT OUTER JOIN block_devices b ON m.device_alias = b.name LEFT OUTER JOIN disk_encryption d ON m.device_alias = d.name;
I0912 21:06:10.304114 253878272 distributed.cpp:138] Executing distributed query: kolide:populate:uptime: SELECT total_seconds FROM uptime;
I0912 21:06:10.305757 253878272 distributed.cpp:138] Executing distributed query: kolide:populate:version: SELECT name, major, minor, patch, build, version, platform, platform_like, codename FROM os_version;
I0912 21:06:10.307452 253878272 distributed.cpp:138] Executing distributed query: kolide:populate:wifi_networks: SELECT * FROM wifi_survey;
{"caller":"level.go:63","err":"sending status logs: writing buffered logs: writing logs: transport error sending logs: rpc error: code = Unavailable desc = transport is closing","level":"info","ts":"2017-09-12T22:06:31.537321819Z"}
{"caller":"level.go:63","err":"sending status logs: writing buffered logs: writing logs: transport error sending logs: rpc error: code = Unavailable desc = transport is closing","level":"info","ts":"2017-09-12T22:25:37.708667224Z"}

create a dev server for testing the launcher

Right now it's not possible to start the launcher without a working gRPC server. Not a huge deal, because we have grpc support in the rails repo, but ideally, the two projects can be tested in isolation.

Isolating the dependencies will make it easier to continue developing features which don't depend on the server, like managing the osquery runtime or autoupdate.

Generate dev packages

I have been doing this, but my home internet is down so I am in need of some assistance, since this is a very network heavy operation.

If you're not sure how, talk to me or @groob for help on installing the Kolide developer certificate locally.

screen shot 2017-08-28 at 12 22 15 am

Next, run the package builder too to generate the development packages. From the root of the repository:

make package-builder
./build/package-builder dev --debug --mac_package_signing_key="Developer ID Installer: Kolide Inc (YZ3EM74M78)"

Right now, the PR range to build is hard-coded in package-builder.go. Ideally, I'd like this to be parameterized (or better, calculated), but the range that it is right now (400-600) is good for the range of cloud PRs that need generating (8/28/2017).

handle ResourceExhausted errors

If the server is overloaded, it will return a ResourceExhausted gRPC error.

This indicates we should retry the connection with a backoff timer.

{"caller":"level.go:63","err":"rpc error: code = ResourceExhausted desc = ","level":"debug","method":"RequestQueries","reauth":false,"res":"null","took":"206.00176ms","ts":"2017-08-30T22:09:58.813502717Z","uuid":"f2b9a22a-9417-40e2-8463-dac98f1be734"}

Startup sometimes fails on Linux

Startup fails with the following message:

{"caller":"level.go:63","launching osquery instance: could not create extension manager server at /tmp/osquery.sock: opening socket transport: dial unix /tmp/osquery.sock: connect: connection refused":"(MISSING)","level":"info","ts":"2017-09-14T00:31:55.791375316Z"}

Build gRPC-powered osquery extension

Using the osquery go bindings at https://github.com/kolide/osquery-golang, we need to implement the complete set of osquery plugin APIs (logger, config, distributed read, distributed log, carve, etc) using gRPC as the transport with a remote server.

Schema and development server

This component should include a service definition in protobuf's IDL that defines the service that must be implemented by the server. Even though the agent will largely be a client of the gRPC connection here, a reference implementation of the server should be implemented for testing end-to-end operations.

Scope of transport

It should be noted that the osquery extension methods are not the only components of the agent that will require use of the gRPC service client. The ability for other independent parts of the agent to access and communicate with the remote server should be considered and accounted for.

Execution environment

This extension will be launched and managed by the osquery execution manager in a to-be-determined format/API.

create versioning strategy for the launcher binary

As we deploy the launcher to production, we need to start tracking which versions are out there.
For that to happen we need to come up with a versioning strategy for creating/tagging releases. Especially when creating a new package.

We could designate stable/beta/nightly on a set schedule (like a beta becomes stable every 2 weeks).

We should also add a launcher_info table in osquery so that we can run that query in our SaaS product.

Add osquery flagfile option

We start osqueryd with an exec command and pass all configuration via CLI flags.
Sometimes it is necessary to adjust the configuration osqueryd launches with, but that would mean modifying the launcher source code -- for example, starting osqueryd with --tls_dump, --verbose or --debug flags.

I propose we write the osqueryd flags into /etc/kolide/flags and allow osquery to be restarted with adjusted options.

Thoughts?

Packaging: location of binary and configuration on system

I'd like to start working on packaging for different platforms. Do we have an idea of where we'd like the binary to end up? What about configuration?

How does this sounds for posix?

  • /usr/local/bin/osql for the binary
  • /etc/kolide/ for configuration (if we have any?)
  • /var/log/kolide for osquery AND osql log files?

Table Request - Quarantine_Events

Table Request: Quarantine_Events:

The SQL Lite DB stored at:
~/Library/Preferences/com.apple.LaunchServices.QuarantineEventsV2
contains an enumerated list of every file ever downloaded to a Mac

This table could be used to identify malicious software downloads, when they occurred, the source of the file etc.

Screenshot of the table:
image

Privacy Concerns:

This table includes every file ever downloaded, private browsing or not (unless it has been manually deleted from this DB).

It would need very strong auditing rules/oversight to prevent abuse as it borders on browsing-history levels of invasiveness.

calling Recover on a healthy instance panics

to repro:

diff --git a/cmd/launcher/launcher.go b/cmd/launcher/launcher.go
index a9049b5..47b8bfd 100644
--- a/cmd/launcher/launcher.go
+++ b/cmd/launcher/launcher.go
@@ -130,7 +130,7 @@ func main() {
                defer launcherUpdater.Stop()
        }

-       if _, err := osquery.LaunchOsqueryInstance(
+       instance, err := osquery.LaunchOsqueryInstance(
                osquery.WithOsquerydBinary(opts.osquerydPath),
                osquery.WithRootDirectory(opts.rootDirectory),
                osquery.WithConfigPluginFlag("kolide_grpc"),
@@ -139,10 +139,15 @@ func main() {
                osquery.WithOsqueryExtensionPlugin(logger.NewPlugin("kolide_grpc", osquery.LogString)),
                osquery.WithStdout(os.Stdout),
                osquery.WithStderr(os.Stderr),
-       ); err != nil {
+       )
+       if err != nil {
                log.Fatalf("Error launching osquery instance: %s", err)
        }

+       if err := instance.Recover(); err != nil {
+               log.Fatal(err)
+       }
+
        sig := make(chan os.Signal)
        signal.Notify(sig, os.Interrupt)
        <-sig

Apparently o.cmd is nil, inside recover. Not sure why yet.

implement 'pause' for hosts via agent

From the kolide/cloud ui, a user would be able to 'pause' a host from reporting up to cloud.

concept

'pause' is the core concept because:

  • the host would NOT stop checking in, so that if a user decides to 'unpause' the host, they'd pull this info down in a subsequent configuration
  • this host would be temporarily removed from live/scheduled querying and workspaces, but data about the host would be kept

questions

  • what do we do about data that osqd generated while the host was paused? does that fill up rocksdb or do we discard it?
  • do we count hosts against host quotas while paused?

Launcher appears to send status logs as snapshot

I have this go method implementing PublishLogs

func (svc *HostService) PublishLogs(ctx context.Context, nodeKey string, logType logger.LogType, logs []string) (string, string, bool, error) {
	_, invalid, err := svc.authenticateHost(ctx, nodeKey)
	if err != nil {
		return "", "", invalid, errors.Wrap(err, "authenticate to publishing logs")
	}

	if logType == logger.LogTypeStatus {
		return "", "", false, nil
	}
	fmt.Println(logType)
	for _, l := range logs {
		fmt.Println(l)
	}

	return "", "", false, nil
}

And I got back the following output

snapshot
{"s":"1","f":"init.cpp","i":"649","m":"Error reading config: error getting config: loading config failed, no cached config: transport error retrieving config: rpc error: code = Unavailable desc = transport is closing","h":"FA01680E-98CA-5557-8F59-7716ECFEE964","c":"Sun Aug 27 14:43:33 2017 UTC","u":"1503845013"}
{"s":"0","f":"events.cpp","i":"824","m":"Event publisher failed setup: kernel: Cannot access \/dev\/osquery","h":"FA01680E-98CA-5557-8F59-7716ECFEE964","c":"Sun Aug 27 14:43:33 2017 UTC","u":"1503845013"}
{"s":"0","f":"events.cpp","i":"824","m":"Event publisher failed setup: scnetwork: Publisher not used","h":"FA01680E-98CA-5557-8F59-7716ECFEE964","c":"Sun Aug 27 14:43:33 2017 UTC","u":"1503845013"}
{"s":"1","f":"init.cpp","i":"649","m":"Error reading config: error getting config: loading config failed, no cached config: transport error retrieving config: rpc error: code = Unavailable desc = grpc: the connection is unavailable","h":"FA01680E-98CA-5557-8F59-7716ECFEE964","c":"Sun Aug 27 15:32:39 2017 UTC","u":"1503847959"}
{"s":"0","f":"events.cpp","i":"824","m":"Event publisher failed setup: kernel: Cannot access \/dev\/osquery","h":"FA01680E-98CA-5557-8F59-7716ECFEE964","c":"Sun Aug 27 15:32:39 2017 UTC","u":"1503847959"}
{"s":"0","f":"events.cpp","i":"824","m":"Event publisher failed setup: scnetwork: Publisher not used","h":"FA01680E-98CA-5557-8F59-7716ECFEE964","c":"Sun Aug 27 15:32:39 2017 UTC","u":"1503847959"}
{"s":"1","f":"init.cpp","i":"649","m":"Error reading config: error getting config: loading config failed, no cached config: transport error retrieving config: rpc error: code = Unavailable desc = grpc: the connection is unavailable","h":"FA01680E-98CA-5557-8F59-7716ECFEE964","c":"Sun Aug 27 15:33:33 2017 UTC","u":"1503848013"}
{"s":"0","f":"events.cpp","i":"824","m":"Event publisher failed setup: kernel: Cannot access \/dev\/osquery","h":"FA01680E-98CA-5557-8F59-7716ECFEE964","c":"Sun Aug 27 15:33:33 2017 UTC","u":"1503848013"}
{"s":"0","f":"events.cpp","i":"824","m":"Event publisher failed setup: scnetwork: Publisher not used","h":"FA01680E-98CA-5557-8F59-7716ECFEE964","c":"Sun Aug 27 15:33:33 2017 UTC","u":"1503848013"}

Add performance monitoring to launcher's osquery runtime

As a result of #95, we disabled osquery's watchdog functionality in #103. The underlying reason WHY the extension wouldn't start up is because when the launcher starts an osquery instance, a fake shell binary is launched as the extension. The launcher itself then registers with the osqueryd process, providing the plugins. When osqueryd restarts itself, it relaunches the extension binary, but since this is just a shell binary, the launcher has no way of knowing to re-register the extension plugins. Thus, we disable the watchdog, since it does nothing but harm the health of the instance. As a result of this, however, we need to add our own performance checks and logging to the launcher to ensure that osquery stays well-behaved.

Handling stale configs

How should stale configs be handled? Should launcher/osquery refuse to start up if the cached config is older than some threshold? Is there a remediation mechanism?

Slice bounds out of range panic in distributed write plugin

I0828 11:32:11.731703 191680512 distributed.cpp:138] Executing distributed query: kolide:populate:startup_items: SELECT si.name, f.path, si.type, si.source, si.status, si.username, f.btime, f.mtime FROM startup_items si LEFT OUTER JOIN file f USING (path);
I0828 11:32:11.758594 191680512 distributed.cpp:138] Executing distributed query: kolide:populate:storage: SELECT b.label, m.device_alias, m.device, m.blocks, m.blocks_available, m.blocks_size, m.path, m.type as filesystem, d.encrypted, d.type as encryption_type, d.uid as encryption_uid, d.user_uuid as encryption_user_uuid FROM mounts m LEFT OUTER JOIN block_devices b ON m.device_alias = b.name LEFT OUTER JOIN disk_encryption d ON m.device_alias = d.name;
I0828 11:32:11.802568 191680512 distributed.cpp:138] Executing distributed query: kolide:populate:uptime: SELECT total_seconds FROM uptime;
I0828 11:32:11.828732 191680512 distributed.cpp:138] Executing distributed query: kolide:populate:version: SELECT name, major, minor, patch, build, version, platform, platform_like, codename FROM os_version;
2017/08/28 11:32:11 panic in processor: runtime error: slice bounds out of range: goroutine 2828 [running]:
runtime/debug.Stack(0xc4205c88d0, 0x14d6ec0, 0x1803c40)
	/usr/local/go/src/runtime/debug/stack.go:24 +0x79
github.com/kolide/launcher/vendor/git.apache.org/thrift.git/lib/go/thrift.(*TSimpleServer).processRequests.func1()
	/Users/marpaia/go/src/github.com/kolide/launcher/vendor/git.apache.org/thrift.git/lib/go/thrift/simple_server.go:186 +0x5a
panic(0x14d6ec0, 0x1803c40)
	/usr/local/go/src/runtime/panic.go:489 +0x2cf
encoding/json.(*decodeState).unmarshal.func1(0xc4205c9bd0)
	/usr/local/go/src/encoding/json/decode.go:170 +0xea
panic(0x14d6ec0, 0x1803c40)
	/usr/local/go/src/runtime/panic.go:489 +0x2cf
encoding/json.(*decodeState).unmarshal.func1(0xc4205c96d8)
	/usr/local/go/src/encoding/json/decode.go:170 +0xea
panic(0x14d6ec0, 0x1803c40)
	/usr/local/go/src/runtime/panic.go:489 +0x2cf
github.com/kolide/launcher/vendor/github.com/kolide/osquery-go/plugin/distributed.(*OsqueryInt).UnmarshalJSON(0xc4202f3640, 0xc420180e3a, 0x1, 0x6c6, 0xc4202a7e00, 0x1e07ca8)
	/Users/marpaia/go/src/github.com/kolide/launcher/vendor/github.com/kolide/osquery-go/plugin/distributed/distributed.go:110 +0x254
encoding/json.(*decodeState).literalStore(0xc4201dcfc0, 0xc420180e3a, 0x1, 0x6c6, 0x14a71c0, 0xc4202f3640, 0x182, 0x1010f00)
	/usr/local/go/src/encoding/json/decode.go:832 +0x27de
encoding/json.(*decodeState).literal(0xc4201dcfc0, 0x14a71c0, 0xc4202f3640, 0x182)
	/usr/local/go/src/encoding/json/decode.go:799 +0xdf
encoding/json.(*decodeState).value(0xc4201dcfc0, 0x14a71c0, 0xc4202f3640, 0x182)
	/usr/local/go/src/encoding/json/decode.go:405 +0x32e
encoding/json.(*decodeState).object(0xc4201dcfc0, 0x14d36c0, 0xc4203b0f38, 0x195)
	/usr/local/go/src/encoding/json/decode.go:733 +0x12d8
encoding/json.(*decodeState).value(0xc4201dcfc0, 0x14d36c0, 0xc4203b0f38, 0x195)
	/usr/local/go/src/encoding/json/decode.go:402 +0x2f4
encoding/json.(*decodeState).object(0xc4201dcfc0, 0x1498940, 0xc4203b0f30, 0x16)
	/usr/local/go/src/encoding/json/decode.go:733 +0x12d8
encoding/json.(*decodeState).value(0xc4201dcfc0, 0x1498940, 0xc4203b0f30, 0x16)
	/usr/local/go/src/encoding/json/decode.go:402 +0x2f4
encoding/json.(*decodeState).unmarshal(0xc4201dcfc0, 0x1498940, 0xc4203b0f30, 0x0, 0x0)
	/usr/local/go/src/encoding/json/decode.go:184 +0x21a
encoding/json.Unmarshal(0xc42017e000, 0x3262, 0x3500, 0x1498940, 0xc4203b0f30, 0x12ba1cc, 0x14a7840)
	/usr/local/go/src/encoding/json/decode.go:104 +0x148
github.com/kolide/launcher/vendor/github.com/kolide/osquery-go/plugin/distributed.(*ResultsStruct).UnmarshalJSON(0xc4203b0ef0, 0xc42017e000, 0x3262, 0x3500, 0xc4201a5800, 0x1e07c80)
	/Users/marpaia/go/src/github.com/kolide/launcher/vendor/github.com/kolide/osquery-go/plugin/distributed/distributed.go:139 +0x172
encoding/json.(*decodeState).object(0xc4201dcea0, 0x14d8780, 0xc4203b0ef0, 0x16)
	/usr/local/go/src/encoding/json/decode.go:598 +0x1c88
encoding/json.(*decodeState).value(0xc4201dcea0, 0x14d8780, 0xc4203b0ef0, 0x16)
	/usr/local/go/src/encoding/json/decode.go:402 +0x2f4
encoding/json.(*decodeState).unmarshal(0xc4201dcea0, 0x14d8780, 0xc4203b0ef0, 0x0, 0x0)
	/usr/local/go/src/encoding/json/decode.go:184 +0x21a
encoding/json.Unmarshal(0xc42017e000, 0x3262, 0x3500, 0x14d8780, 0xc4203b0ef0, 0x3500, 0x193e000)
	/usr/local/go/src/encoding/json/decode.go:104 +0x148
github.com/kolide/launcher/vendor/github.com/kolide/osquery-go/plugin/distributed.(*Plugin).Call(0xc420120f40, 0x17d70e0, 0xc4200101d8, 0xc420374ba0, 0xc4201e8508, 0x1, 0x0, 0x4)
	/Users/marpaia/go/src/github.com/kolide/launcher/vendor/github.com/kolide/osquery-go/plugin/distributed/distributed.go:232 +0x290
github.com/kolide/launcher/vendor/github.com/kolide/osquery-go.(*ExtensionManagerServer).Call(0xc42018a310, 0xc4203b0eb1, 0xb, 0xc4203b0ec0, 0xb, 0xc420374ba0, 0xc4201bfe60, 0xc4201bfea2, 0x4)
	/Users/marpaia/go/src/github.com/kolide/launcher/vendor/github.com/kolide/osquery-go/server.go:228 +0x119
github.com/kolide/launcher/vendor/github.com/kolide/osquery-go/gen/osquery.(*extensionProcessorCall).Process(0xc420192360, 0xc400000000, 0x17de000, 0xc4201bfe60, 0x17de000, 0xc4201bfef0, 0xc4201ca678, 0x1028f4e, 0xc420190080)
	/Users/marpaia/go/src/github.com/kolide/launcher/vendor/github.com/kolide/osquery-go/gen/osquery/osquery.go:1365 +0x351
github.com/kolide/launcher/vendor/github.com/kolide/osquery-go/gen/osquery.(*ExtensionProcessor).Process(0xc420189420, 0x17de000, 0xc4201bfe60, 0x17de000, 0xc4201bfef0, 0x0, 0x0, 0xc42001a000)
	/Users/marpaia/go/src/github.com/kolide/launcher/vendor/github.com/kolide/osquery-go/gen/osquery/osquery.go:1284 +0x306
github.com/kolide/launcher/vendor/git.apache.org/thrift.git/lib/go/thrift.(*TSimpleServer).processRequests(0xc42006cc00, 0x17d99a0, 0xc420374b10, 0x0, 0x0)
	/Users/marpaia/go/src/github.com/kolide/launcher/vendor/git.apache.org/thrift.git/lib/go/thrift/simple_server.go:201 +0x265
github.com/kolide/launcher/vendor/git.apache.org/thrift.git/lib/go/thrift.(*TSimpleServer).AcceptLoop.func1(0xc42006cc00, 0x17d99a0, 0xc420374b10)
	/Users/marpaia/go/src/github.com/kolide/launcher/vendor/git.apache.org/thrift.git/lib/go/thrift/simple_server.go:142 +0x79
created by github.com/kolide/launcher/vendor/git.apache.org/thrift.git/lib/go/thrift.(*TSimpleServer).AcceptLoop
	/Users/marpaia/go/src/github.com/kolide/launcher/vendor/git.apache.org/thrift.git/lib/go/thrift/simple_server.go:145 +0xfb
I0828 11:32:14.319005 192217088 scheduler.cpp:75] Executing scheduled query pack:kolide:host_info:all:storage: SELECT b.label, m.device_alias, m.device, m.blocks, m.blocks_available, m.blocks_size, m.path, m.type as filesystem, d.encrypted, d.type as encryption_type, d.uid as encryption_uid, d.user_uuid as encryption_user_uuid FROM mounts m LEFT OUTER JOIN block_devices b ON m.device_alias = b.name LEFT OUTER JOIN disk_encryption d ON m.device_alias = d.name;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.