Comments (12)
We'll see if we can figure it out @skymob. I'm not familiar with the Chef-API gem, but I'll dig up the auth code in the Chef server when I have a minute and link it here with any thoughts I have that might point to what is going on.
from chef-server.
Hi @mmzyk - checking back in on this issue. Do you have any suggestions for things we could be looking at on our end? Specific logs, increasing log verbosity, etc? Would a TCPdump on either end be helpful? We ended up adding retries to our gem that wraps the Chef API gem, but even then sometimes the retry fails up to 4 times in a row.
from chef-server.
Hi @skymob. Sorry for the long silence on this. You caught me on a two week vacation and then coming back to some internal work that had to be sorted out. I'm going to dig into the chef code and pull out where this is failing and maybe it can lead us to why.
from chef-server.
So the workhorse file that's doing the auth check is here: https://github.com/opscode/chef_wm/blob/master/src/chef_wm_base.erl
While the erlang code might look strange, I think the naming is clear enough that you can follow this and get an idea where or why it might be failing.
So, one thing to know is that this file is plugged into webmachine, which is the webserver Chef uses. So based on the return of the functions in the chef_wm_base file webmachine determines which return value to provide. More info on the functions webmachine can be found is here: https://github.com/basho/webmachine/wiki/Resource-Functions
The key takeaway is that webmachine looks for the is_authorized function and if that function returns anything other than true, a 401 is returned. So let's look at what is_authorized is doing.
https://github.com/opscode/chef_wm/blob/master/src/chef_wm_base.erl#L174
We can see it's calling verify_request_signature, which is doing most of the work. That function is found here: https://github.com/opscode/chef_wm/blob/master/src/chef_wm_base.erl#L174
verify_request_signature is gathering a bunch of info (which I'm going to assume is working properly, as the error message you gave isn't one of not_found). It is then calling out here: https://github.com/opscode/chef_wm/blob/master/src/chef_wm_base.erl#L174 to actually authenticate the request.
That call is using some included code that can be found on github here: https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L422
It looks like from that called code this error is resulting: https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L426
So we need to figure out what is happening where the actual code is being run that's throwing an error.
That code that is throwing the error is somewhere in the do_authenticate_user_request function here: https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L438
I've got to leave it at that for now, but will try to circle back around to look at this more. Hopefully that gives you enough to go on to possibly look at this more on your end.
from chef-server.
All right, coming back to this, I see that my copy/paste foo failed in the last message I posted and I posted the same link to chef_wm_base line 174 three times. Go figure.
So, just to wrap up some loose ends on how the code works (I've been purposefully thinking out loud here, or maybe typing out loud, mostly because I don't know what the cause of this error is going to be, so I want to give you as much info as possible to try and solve it).
The entry point for this code is here: https://github.com/opscode/chef_wm/blob/master/src/chef_wm_base.erl#L174, in the is_authorized function.
It will move to the verify_request_signature function, here: https://github.com/opscode/chef_wm/blob/master/src/chef_wm_base.erl#L261
If this method fails, https://github.com/opscode/chef_wm/blob/master/src/chef_wm_base.erl#L287, we construct the failure message, by calling verify_request_message, and so if we follow that we can find the exact error message that you see, which is coming from here: https://github.com/opscode/chef_wm/blob/master/src/chef_wm_base.erl#L469
So that wraps up where the error message is coming from, but this doesn't tell us what is causing the error. That takes us back to the previous code tracing I was doing in the last comment.
I had traced the code across modules to chef_authn and the authenticate_user_request function, https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L422
In that function is the code that causes the error message above, https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L426
So I left off trying to find what was trigging that error path. That took me down to do_authenticate_user_request, https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L438
Following the code, no method before verify_sig will cause the error seen. The call to verify_sig is here https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L438 and the verify_sig function is just below: https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L455
The code actually branches here based on the signing version used. If it is 1.0 or 1.1 it will go this path:
https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L455
If it is 1.2 it will go this path: https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L465
1.1 is the default, as defined in the code here: https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L86
(FYI, the macros that the ?SIGNING_VERSION strings in the code reference that define these items are located in the header file here: https://github.com/opscode/chef_authn/blob/master/src/chef_authn.hrl so that is how those values are being resolved)
Since I can't be sure the which signing version you are using from the info I have, if we go back a bit, we can see it is pulled from the headers here: https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L445 and is then passed along through the request. So you should be able to look at the headers being sent and see if you are using 1.0, 1.1, or 1.2 and follow the code path as appropriate.
The code that pulls the sign version from the header, if you follow the code deep enough, is here: https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L392 You can see it's looking for the X-Ops-Sign value to determine what the signing version is, so look for that header value in your requests.
So, if we follow the 1.0, 1.1 default path, we're going to call decrypt_sig https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L392 with the AuthSig value and the public key. decrypt_sig is here, and it branches based on if we find an RSA public key or not: https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L469
However, note that if we don't find an RSA Public Key, we assume the key is encoded and try to decode it, then call decrypt_sig again. For completeness, the decode_key function is here: https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L494
So when decrypt_sig returns, in the 1.0 and 1.1 default Path you can see that we attempt to do what looks like an assignment here: https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L458. Except we passed in Plain, so what we're really doing is pattern matching. If the decrypt_sig doesn't match up with Plain, which was already passed in, then this will fail with Erlang returning an error:bad_match, which will trigger the code path seen above. Plain was set in the previous function here: https://github.com/opscode/chef_authn/blob/master/src/chef_authn.erl#L447
If we follow the 1.2 code path, we get in a similar situation:, in that in the verify_sig function for that code path we call public_key:verify and match it against true. If public_key verify doesn't return true, then we fail the match, resulting in the error seen.
The functions that are in the public_key module and are called here are located in the core Erlang library. http://erlang.org/doc/man/public_key.html
So, to come full circle, I can't say exactly what is happening @skymob, except that for some reason the auth info being sent across is failing to authenticate properly after it reaches a very deep level where it is comparing the key and the signature. We do know that having a client with the same name as a user can cause this, but in that cause it would be expected to fail for each request, not in the pattern being seen here.
I'd suggest trying to capture the auth info being sent from both failing and successful requests and seeing if there is a difference. Given this is an intermittent issue (meaning there doesn't seem to be a consistent reproducible case, not that it doesn't happen often), I am inclined to think this is an issue with the chef-api gem and not with the server itself, especially since knife works just fine. I am not surprised this fails across different types of requests, as this auth code is fundamental to every request made to the chef server.
Let us know if during further investigation you still think this might be a server issue, but I'm going to close this issue out for now, since I don't believe this to be a chef server issue but instead to likely be a chef-api gem issue. The chef-api gem is a community project and is not maintained by Chef, to be clear on that.
Hopefully this helps @skymob. Good luck figuring it out.
from chef-server.
@mmzyk, this makes sense. Thanks again for your very thorough research!
from chef-server.
I tried filing this issue with the chef-api gem, but the maintainer is still unconvinced. chef-boneyard/chef-api#32
from chef-server.
@phene For better or worse, the only reports I've seen of this happening are with the chef-api gem, which isn't a Chef maintained project. Beyond what I've already investigated I don't have the bandwidth to devote to trying to track this down, especially since I don't personally use the chef-api gem. The maintainer maybe right or wrong, but it's likely going to be up to one of the users of the chef-api gem to try and track this down or come up with a consistently reproducible case. Hopefully the code paths I've laid out above can be helpful to anyone who wants to try and track this down.
from chef-server.
also having this problem intermittently.
from chef-server.
I can reproduce this. I've covered the full details with scripts and tcp dumps here:
chef-boneyard/chef-api#32 (comment)
from chef-server.
@spuder Thanks for the detailed investigation. I'll try to take a look at the issue this week.
from chef-server.
@spuder If you could retry your various tests with chef-boneyard/chef-api#39, I'd love to know if that solves it for you.
from chef-server.
Related Issues (20)
- /cookbook_versions endpoint sometimes returns 'busy' as body response under heavy load
- Installing chef-manage via chef-server-ctl is not working HOT 1
- chef-server-ctl user-create with prompt for password is broken HOT 1
- Upgrade to rails 7 and ruby 3+ in oc-id HOT 2
- OCID: profile email update is throwing error
- New nodes aren't indexed but are known to Chef-Server (Version 14) HOT 7
- Update the version of Chef server in Automate HOT 1
- Unable to upload/delete cookbook with Chef Admin account
- Chef Automate 2022-01 failing chef-server-ctl test HOT 1
- Chef Client Range Search Unexpected Results HOT 4
- API Endpoints to update client certs not accessible PUT HOT 1
- Cookbook parsing fails on restore knife ec backup/restore. HOT 1
- embedded knife commands show warnings HOT 1
- Incorrect metadata in a cookbook causes all client runs on nodes in that org to fail, irrespective of them using the cookbook in question. HOT 1
- Update External Opensearch documentation with the user permissions required for Chef to work correctly with Opensearch. HOT 1
- chef-server-ctl test in failing in FIPS enabled Amazon Linux 2 system. HOT 1
- Chef server install fails at "add internal user to opensearch security plugin" on local proxmox host but not AWS HOT 2
- Unable to `chef-server-ctl reconfigure` a new 15.3.2 install on Ubuntu 22.04 HOT 8
- Cookbook with invalid dependencies causes ALL Chef client runs to begin failing (even on nodes that do not use the cookbook in question) HOT 4
- New OpenSSL requirements in RHEL 9 in fips mode [RHSA-2023:3722-01], cannot connect to Chef Server anymore with no EMS support
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chef-server.