Config validation now happens within building the actual sink but this should be pushe

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I've got a branch <a href="https://github.com/timberio/vector/tree/improve-config-vali

Would --validate be better than <code class="notransl

Improve config validation about vector HOT 17 CLOSED

LucioFranco commented on May 5, 2024

Improve config validation

from vector.

Comments (17)

binarylogic commented on May 5, 2024 1

That makes sense. IMO the purpose of --dry-run should be to indicate any problems before starting Vector. So it's current behavior makes sense to me. It'd be nice to break down what is happening here into steps:

lint (valid toml syntax, valid Vector fields, etc)
graph (at least 1 source, all inputs are valid, no circular dependencies, etc)
environment (data_dir exists, is writable, etc)

Then we could support a flag like:

vector --config vector.toml --dry-run --check-depth=lint

Just throwing out an alternative. This would let the user configure how "deep" the configuration checking goes.

from vector.

lukesteensen commented on May 5, 2024

We do have a spot for this and do it in a few sinks already. See the http sink for an example.

from vector.

binarylogic commented on May 5, 2024

@LucioFranco is this still relevant?

from vector.

LucioFranco commented on May 5, 2024

Yes, I would like to keep this open. Ideally, all config options that can have validation should be pushed into serde types so that config validation issues can be exposed during config parsing, not during sink creation. This would provide the best UX.

from vector.

lukesteensen commented on May 5, 2024

@LucioFranco I'm not sure how coupling validation with parsing would provide a better UX. Can you give some more concrete examples of UX problems you think that change would solve?

from vector.

LucioFranco commented on May 5, 2024

@lukesteensen I think for config validation is a good example. There may be points where we want to ensure for CI for example that configs are valid. Extracting as much validation around that to serde would allow us to decouple any materialized sink creation from checking for a valid configuration. Serde is powerful enough to handle a lot of this for us and I personally think we should take advantage. Example of something I think we could move to serde errors would be something like URI. There is a bunch of static checking that is done via Uri::parse and is returned usually during build in the sink. This could easily be moved to a serde error so that this error is surfaced before any sinks are attempted to be created. The logs would be earlier as well which I think is a good plus too.

from vector.

Jeffail commented on May 5, 2024

I think stricter validation during --dry-run is a good thing but as it currently stands we're in a pretty good place. The vast majority of user errors are going to be typos that will be caught with the current checks.

Things I think we could improve:

Validation currently seems to check for things such as the existence of the data_dir, it would be good to suppress errors related to our runtime environment when performing dry runs since it might not be possible to fully replicate our target environment in our CI.

We also ought to spit out the config file path in an error log when validation fails. The reason this would be nice is when we're validating a large directory of vector configs we get a huge amount of INFO log spam, e.g.:

$ find ./config/examples -name "*.toml" | xargs -I{} sh -c "vector --dry-run -c {} || exit 255"
Oct 16 12:25:15.740  INFO vector: Log level "info" is enabled.
Oct 16 12:25:15.740  INFO vector: Loading config. path="./config/examples/complete/stdio.toml"
Oct 16 12:25:15.752  INFO vector: Vector is starting. version="0.4.0-dev" git_version="v0.5.0-12-g1378575" released="Tue, 15 Oct 2019 12:24:23 GMT" arch="x86_64"
Oct 16 12:25:15.753  INFO vector: Dry run enabled, exiting after config validation.
Oct 16 12:25:15.754  INFO vector: Config validated, exiting.
Oct 16 12:25:15.776  INFO vector: Log level "info" is enabled.
Oct 16 12:25:15.778  INFO vector: Loading config. path="./config/examples/complete/file_to_prometheus.toml"
Oct 16 12:25:15.796  INFO vector: Vector is starting. version="0.4.0-dev" git_version="v0.5.0-12-g1378575" released="Tue, 15 Oct 2019 12:24:23 GMT" arch="x86_64"
Oct 16 12:25:15.796  INFO vector: Dry run enabled, exiting after config validation.
Oct 16 12:25:15.814 ERROR vector::topology: Configuration error: Source "file": data_dir "/var/lib/vector" does not exist

So naturally we'll want to set -q in our CI stages, but then we lose context as to which config caused our validation error:

$ find ./config/examples -name "*.toml" | xargs -I{} sh -c "vector --dry-run -q -c {} || exit 255"
Oct 16 12:30:18.361 ERROR vector::topology: Configuration error: Source "file": data_dir "/var/lib/vector" does not exist

As a bonus we could also consider implementing triple dot syntax to enable vector --dry-run -c ./config/examples/.... It would make CI scripts simpler. However, this also changes the behaviour of -c based on other flags which is rather naughty.

from vector.

binarylogic commented on May 5, 2024

Validation currently seems to check for things such as the existence of the data_dir, it would be good to suppress errors related to our runtime environment when performing dry runs since it might not be possible to fully replicate our target environment in our CI.

Agree.

We also ought to spit out the config file path in an error log when validation fails.

I also agree. I believe you're saying the same thing, but the only output when --dry-run is passed should be errors and a single success confirmation. I do not think -q should be required. The exit code should also reflect the status (which it currently does).

from vector.

Jeffail commented on May 5, 2024

I've got a branch https://github.com/timberio/vector/tree/improve-config-validation that lowers the default log level (to warn), but it still responds to --quiet and --verbose flags. I've also moved the exit path from a non-healthchecked dry run to before the topology is built and verified because that's where our components are checking for data_dir.

The result is that --dry-run on its own is able to verify that config fields are typed correctly, which will catch typos and bad values, etc. What it doesn't do is check that the overall config is correct (>1 sources, inputs all exist, no circular dependencies, etc).

The benefit of that approach is that it allows us to test our non-complete examples for problems in our CI. When we start allowing config file imports it would also allow users to lint their partial config snippets individually.

I think the sensible approach is the following two steps:

Add another flag to complement --dry-run that indicates the verification should finish before verifying the whole topology (perhaps --partial or something). The behaviour of --dry-run will be changed back to it's current behavior in master.
Separate the data_dir initialization of sources to after build so that the topology can be verified without it.

We'd then end up with three commands:

vector -c ./foo.toml --dry-run --partial, which checks that the config file has correct fields and values for the components it contains (zero sources or sinks is permitted).

vector -c ./foo.toml --dry-run, which checks the above and also that the config has a complete and valid topology.

vector -c ./foo.toml --dry-run --require-healthy, which checks the above and also that the data directory can be constructed and endpoints can be reached, etc.

from vector.

LucioFranco commented on May 5, 2024

Would --validate be better than --dry-run? In my mind dry run would actually show us how events might pass through aka run through the system. Maybe --validate would be a bit more intuitive for users?

that lowers the default log level (to warn)

Is there a reason to make warn default? Maybe I've missed a precedent but I've always considered info to be default.

from vector.

binarylogic commented on May 5, 2024

Yeah, --validate is not a bad idea. Also, please read the comments above regarding the info default.

from vector.

Jeffail commented on May 5, 2024

If we separate out into a new flag we can leave the logging default to info for the --dry-run command, which probably plays nicer with user expectations.

from vector.

binarylogic commented on May 5, 2024

Yep, let's do the following:

Leave --dry-run the way it is. This should perform all validations, including environment verification (ex: validating data_dir).
Add a validate subcommand. Ex: vector validate --config=vector.toml, where --config works exactly the same way with the root level command. To me, this makes more sense as a sub-command, happy to hear other opinions.
Perhaps we could support a --topology=false to disable validating the inputs field as well as circular dependencies?
The vector validate output should be success/fail. Succes should print a confirmation message and exit with a 0 code. Fail should print a user-friendly error message and exit with the appropriate code (see #176 for exit codes).

What do you think?

from vector.

Jeffail commented on May 5, 2024

If we go down the subcommand route then the root level commands are inherited. So for example the command vector -c ./foo.toml validate would be parsed as correct by our command line lib. I think that's acceptable as we can include usage examples in the help text.

I can alternatively add the same flag to the subcommand which would enable vector validate -c ./foo.toml, but then vector -c ./foo.toml validate -c ./bar.toml would also be considered valid. Since the root --config flag has a default value we can't parse it as an optional flag. I'm sure there's a way of catching it and warning the user, but it seems a little gross.

I'll implement with vector -c ./foo.toml validate for now and maybe during code review we can revisit an implementation of vector validate -c ./foo.toml that explicitly denies the root flag.

from vector.

Jeffail commented on May 5, 2024

I've got an implementation here: bcd3cd5

Not sure I like it, the help text that's generated with vector validate --help prints my example but the usage text doesn't indicate the presence of [FLAGS] before the subcommand, which in this case there is.

Validate the target config, then exit.

E.g. `vector --config ./example.toml validate`

USAGE:
    vector validate [FLAGS]

Everyone has different experience with CLIs so it's difficult to say how big of a deal this is. I'm going to work on separating out the topology verification and come back to this.

from vector.

binarylogic commented on May 5, 2024

Yeah, this interface is odd. It appears --config is a global flag that will be available on every command, correct? What are our options for correcting this? (besides the double -c flag). From what I understand the "default" command is using global flags for its arguments.

from vector.

Jeffail commented on May 5, 2024

I've implemented vector validate -c ./foo.toml, where vector -c ./bar.toml validate returns a USAGE error code. It's a little nasty in its implementation so hopefully there's a more elegant solution, but otherwise I think for the sake of usability I can live with this change. Comment here: #1064 (comment)

from vector.

Improve config validation about vector HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent