Comments (18)
Because of a hunch I replaced the http
dependency of our plugin with resty.http
. So far we haven't had any more issues. Seems like for some reason the lua-http dependency was hogging the DNS-Socket. Not sure why that was only happening with newer versions of Kong, probably some kind of race condition. Anyways, cheers for the support
from kong.
Could it be related to this issue (currently open): #12290
It's not the same error in the logs, but a similar response and related to DNS as well
from kong.
We found the change that causes worsens our issue: #11871
We updated our kong.conf
with dns_no_sync = on
(the name confused me a bit, first I turned it off, thinking it would turn off dns_sync, then I noticed the no in there)
This resolves our problem, but the underlying issue is still there if someone cares to investigate
The problem persists, but happens quite rarely.
from kong.
Possibly related to #3223?
from kong.
Any input on this?
from kong.
@chobits , could you give us some comments about this?
from kong.
will look into it later
from kong.
socket busy
This might be sock:setpeername() is trying to resolve the name, during this process, if someone tries to execute sock:send()
, then it will report socket busy
error
We updated our kong.conf with dns_no_sync = on, The problem persists, but happens quite rarely.
So it is as expected when you enable sync, which means a large number network queries for the same domain at the same time in one worker have been merged into one coroutine. It is like fewer DNS queries trigger fewer DNS errors.
If you could provide debug log in the context of error message, it would be beneficial to debug for us.
from kong.
Hi @chobits, thanks for chipping in
Here are the logs. I included some startup logs as well so you can see the configuration. It happened right off the bat as well
Edit: I tried with Kong 3.6.1 (the latest available version on Dockerhub)
kong-logs.csv
The problem is quite severe for us (at least with the DNS sync enabled) and happens even if there are very few request being made. The example above are a few system tests that run through. I guess that there are about 50 calls within 2 minutes. I don't imagine it is a heavy load problem, rather a configuration problem on our end or something else
Some more information that could possibly be relevant:
- We have a custom auth plugin installed with a few lua deps
- We use the official ubuntu image and update the dependencies in our custom dockerfile (because of our vulnerability management policies)
- We are inside a Google Kubernetes Cluster and behind a Load Balancer that provides the SSL-Support
- We deploy via a custom helm chart, since we just need a minimal setup
from kong.
Sry for late reply, I have reviewed your kong-logs.csv, but I dont find some useful logs to help us to investigate. Because most useful debug logs have been commentted in DNS client library.
And I guess there might be some incorrect DNS resoluton operate executed by sock:connect()
in lua-nginx-module, I still need to spend some time on that and then tell you how to debug it.
BTW, Is xxx
hostname in your logs a hidden domain for the real domain name?
from kong.
hi I have thought of a method to mitigate your issue, you could configure dns_stale_ttl
from 4s to a longer value. Currently in kong latest version, we have set it to 3600s, you could also set it to this value, which gives your dns records a longer lifespan.
from kong.
Thanks for the reply.
Yes xxx
is a real kubernetes domain name(s) (internal DNS)
It seems like the staleTtl is already at 3600
as far as I can tell. Please correct me if I'm wrong here. Here a snippet from the logs that describes the DNS settings more verbosely (staleTtl
being the second from the bottom):
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:647: init(): [dns-client] emptyTtl = 30 s","2024-03-07T08:24:08.956815518Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:645: init(): [dns-client] badTtl = 1 s","2024-03-07T08:24:08.956632467Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:633: init(): [dns-client] search = king.svc.cluster.local, svc.cluster.local, cluster.local, google.internal","2024-03-07T08:24:08.956478982Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:631: init(): [dns-client] ndots = 5","2024-03-07T08:24:08.956330928Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:627: init(): [dns-client] timeout = 2000 ms","2024-03-07T08:24:08.956134323Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:618: init(): [dns-client] no_random = true","2024-03-07T08:24:08.955963978Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:611: init(): [dns-client] attempts = 5","2024-03-07T08:24:08.955823817Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:606: init(): [dns-client] nameserver 10.0.0.10","2024-03-07T08:24:08.955685968Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:565: init(): [dns-client] validTtl = nil","2024-03-07T08:24:08.955670959Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:556: init(): [dns-client] adding AAAA-record from 'hosts' file: localhost = [::1]","2024-03-07T08:24:08.955662768Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:541: init(): [dns-client] adding A-record from 'hosts' file: localhost = 127.0.0.1","2024-03-07T08:24:08.955627713Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:556: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-allrouters = [fe00::2]","2024-03-07T08:24:08.954856727Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:556: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-localnet = [fe00::0]","2024-03-07T08:24:08.954672873Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:556: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-allnodes = [fe00::1]","2024-03-07T08:24:08.954364051Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:556: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-loopback = [::1]","2024-03-07T08:24:08.954047780Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:556: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-localhost = [::1]","2024-03-07T08:24:08.953833227Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:541: init(): [dns-client] adding A-record from 'hosts' file: xxx = xxx","2024-03-07T08:24:08.953549257Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:556: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-mcastprefix = [fe00::0]","2024-03-07T08:24:08.953099393Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:501: init(): [dns-client] query order = LAST, SRV, A, CNAME","2024-03-07T08:24:08.952530616Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:482: init(): [dns-client] noSynchronisation = false","2024-03-07T08:24:08.950206705Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:478: init(): [dns-client] staleTtl = 3600","2024-03-07T08:24:08.950063140Z"
"2024/03/07 08:24:08 [debug] 1325#0: *1294 [lua] client.lua:473: init(): [dns-client] (re)configuring dns client","2024-03-07T08:24:08.949858585Z"
We don't override any DNS related options in the config file. Here our current kong.conf
(injected into the kong:3.4.2
image):
plugins = auth-internal
declarative_config = /etc/kong/kong.yml
nginx_http_lua_shared_dict=auth_internal 10m
# Change for kong>=3.5 cf https://github.com/Kong/kong/issues/12568
# dns_no_sync = on
Thanks @chobits for your support 🙏
from kong.
Could you try this docker image: kong/kong:3a034e9417743fcf946deb6f98d2993049b1e589-ubuntu
which is built from #12305 based 3.6.x.x. It has passed all our test cases.
In this PR, we refactor our DNS client, if you could try this PR's docker image, it could help us to verify whether this issue still be there in new DNS client. If both versions of kong have this problem, I guess DNS client socket might be messed up by some other subsystem in kong.
from kong.
Thanks for the update @chobits. We get the following error with kong/kong:3a034e9417743fcf946deb6f98d2993049b1e589-ubuntu
:
2024/03/26 08:27:37 [error] 1331#0: *90612 lua entry thread aborted: runtime error: /usr/local/share/lua/5.1/kong/resty/dns_client/init.lua:669: attempt to call method 'lower' (a nil value)
stack traceback:
coroutine 0:
/usr/local/share/lua/5.1/kong/resty/dns_client/init.lua: in function 'execute_original_func'
/usr/local/share/lua/5.1/kong/dynamic_hook/init.lua:144: in function 'raw_func'
/usr/local/share/lua/5.1/kong/tracing/instrumentation.lua:288: in function 'toip'
/usr/local/share/lua/5.1/kong/runloop/balancer/init.lua:364: in function 'execute'
/usr/local/share/lua/5.1/kong/runloop/handler.lua:832: in function 'balancer_execute'
/usr/local/share/lua/5.1/kong/runloop/handler.lua:1352: in function 'after'
/usr/local/share/lua/5.1/kong/init.lua:1186: in function 'access'
access_by_lua(nginx-kong.conf:112):2: in main chunk, client: xxx, server: kong, request: "POST /path/to/resource HTTP/1.1", host: "xxx", request_id: "2a1bbe203f4c2a309845b0777f2b7862"
Additionally in the beginning we get the following (still the same)
[error] 1330#0: *90586 [lua] init.lua:371: execute(): DNS resolution failed: DNS server error: failed to send request to UDP server 10.0.0.10:53: socket busy, Query Time: 1711441647.111 0.000. Tried: [["cluster.address.local:SRV","DNS server error: failed to send request to UDP server 10.0.0.10:53: socket busy, Query Time: 1711441647.111 0.000"]], client: xxx, server: kong, request: "GET /path/to/resource", host: "xxx", request_id: "0b7dc268d13c6648f089681aa55df436"
from kong.
Thanks for the update @chobits. We get the following error with
kong/kong:3a034e9417743fcf946deb6f98d2993049b1e589-ubuntu
:2024/03/26 08:27:37 [error] 1331#0: *90612 lua entry thread aborted: runtime error: /usr/local/share/lua/5.1/kong/resty/dns_client/init.lua:669: attempt to call method 'lower' (a nil value) stack traceback: coroutine 0: /usr/local/share/lua/5.1/kong/resty/dns_client/init.lua: in function 'execute_original_func' /usr/local/share/lua/5.1/kong/dynamic_hook/init.lua:144: in function 'raw_func' /usr/local/share/lua/5.1/kong/tracing/instrumentation.lua:288: in function 'toip' /usr/local/share/lua/5.1/kong/runloop/balancer/init.lua:364: in function 'execute' /usr/local/share/lua/5.1/kong/runloop/handler.lua:832: in function 'balancer_execute' /usr/local/share/lua/5.1/kong/runloop/handler.lua:1352: in function 'after' /usr/local/share/lua/5.1/kong/init.lua:1186: in function 'access' access_by_lua(nginx-kong.conf:112):2: in main chunk, client: xxx, server: kong, request: "POST /path/to/resource HTTP/1.1", host: "xxx", request_id: "2a1bbe203f4c2a309845b0777f2b7862"
Did it constantly occur if you request this request: "POST /path/to/resource HTTP/1.1", host: "xxx", request_id: "2a1bbe203f4c2a309845b0777f2b7862"
?
Did your plugin call this kind of methods: kong.service.set_target()
or kong.service.set_upstream()
.
From kong's logic, it seems that the balancer_data.host
should be a string, so if we call balancer_data.host:lower()
, it will success while not reporting a runtime error.
Additionally in the beginning we get the following (still the same)
[error] 1330#0: *90586 [lua] init.lua:371: execute(): DNS resolution failed: DNS server error: failed to send request to UDP server 10.0.0.10:53: socket busy, Query Time: 1711441647.111 0.000. Tried: [["cluster.address.local:SRV","DNS server error: failed to send request to UDP server 10.0.0.10:53: socket busy, Query Time: 1711441647.111 0.000"]], client: xxx, server: kong, request: "GET /path/to/resource", host: "xxx", request_id: "0b7dc268d13c6648f089681aa55df436"
So it might be the dns socket messed up, but currently your image has no nginx debug log enabled like --with-debug
, so if you enable debug
level, we could not get debug log from nginx's logic/lua-nginx-module's logic, only get the debug log from kong's lua land. Let me think of how to trace it.
from kong.
Did it constantly occur if you request this request: "POST /path/to/resource HTTP/1.1", host: "xxx", request_id: "2a1bbe203f4c2a309845b0777f2b7862"?
No only sporadically. It happens even less with the dns_no_sync
turned on
Did your plugin call this kind of methods: kong.service.set_target() or kong.service.set_upstream().
No we don't call any of these methods. We have a static yaml
that sets those parameters
from kong.
I just tried again with the latest 3.6.1
-Dockerfile and now it seems to work 😕. Strange, but I'll take it. Thanks for your support @chobits. Must be something with the dependencies I'd imagine
from kong.
False alarm. Still happens, just less often
from kong.
Related Issues (20)
- TLS SNI Route not work HOT 7
- Kong info, notice, inspect logs are all getting logged as error in GCP(google cloud platform) HOT 3
- Cannot use kong.db HOT 1
- http-log plugin: Host header not including port HOT 3
- Every time request localhost:8001/metrics, kong-cp-kong-pod will prompt a license-related error HOT 3
- Dataplane not getting information from the ControlPlane in Hybrid mode HOT 6
- go plugin error, worker-events: event callback failed; source=plugin_server, event=reset_instance HOT 2
- Error: attempt to index local 'ssl' (a nil value) for Kong >= 3.6.0 when KONG_PG_SSL=on HOT 6
- Handling two JWT signing keys for same iss HOT 3
- Documentation Improvement - Clarify Plugin Compatibility with Kong Versions](https://github.com/Kong/kong/issues/9999 HOT 1
- Error "error loading plugin schemas: on plugin <my-plugin>: <my-plugin> plugin is enabled but not installed" with Python plugin for kong HOT 6
- Where is kong 3.7.0 container image ? Can't find it on Dockerhub HOT 2
- Hola
- Inconsistent Rate Limiting Behavior Under High Load HOT 2
- No ability in plugin schema to have a record with dynamic fields HOT 2
- Changelog md no longer maintained? HOT 2
- Odd Kong startup errors post 2.8.x to 3.7.0 upgrade HOT 5
- Kong can't match routes correctly with the same hosts, paths, and different protocols in the route. HOT 4
- AWS Lambda plugin in Gateway 2.8.4 missing aws_imds_protocol_version support HOT 1
- https over mtls is not working for upstream HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kong.