Coder Social home page Coder Social logo

Net::OpenTimeout exception about dogapi-rb HOT 21 CLOSED

datadog avatar datadog commented on August 15, 2024
Net::OpenTimeout exception

from dogapi-rb.

Comments (21)

degemer avatar degemer commented on August 15, 2024

Thanks for reporting this @SteveAlexander !
Do you have more details ?
dogapi and the Datadog agent are completely independent, and the presence or absence of the agent shouldn't change anything. Does the agent also has network issues (logs are in /var/log/datadog/forwarder.log), or is it only dogapi ? How often does this happen ?

On a side note, if you're only using dogapi to send metrics, you might want to consider using dogstatsd-ruby, which goes through the agent (it retries on network errors).

from dogapi-rb.

nielsm avatar nielsm commented on August 15, 2024

I'm getting the same thing with fairly regular timeouts. We've gone so far as to set a 120 second socket timeout on the client, but still get several timeouts per day.

from dogapi-rb.

rmoriz avatar rmoriz commented on August 15, 2024

Using the capistrano plugin and also getting 'Could not submit to Datadog, request timed out.' a lot.

from dogapi-rb.

msaffitz avatar msaffitz commented on August 15, 2024

πŸ‘ on this. Our code is pretty simple, but runs in a loop every 10s. Here's the offending snippet:

   def self.check
    datadog = Dogapi::Client.new(ENV["DATADOG_API_KEY"])
    clusters.each do |cluster|
      consumer_groups(cluster).each do |consumer_group|
        status = consumer_status(cluster, consumer_group)
        if status["partitions"] == []
          puts "[#{Time.now.utc}] #{consumer_group}: Missing Partitions"
          tags = [
            "consumer_group:#{consumer_group}",
            "status:missing_partitions"
          ]
          datadog.emit_point("kafka_partition_status", 1, tags: tags)
        end
        grouped = status["partitions"].group_by { |partition| "#{partition['topic']}--#{partition['status']}" }
        grouped.each do |_, partitions|
          topic = partitions.first["topic"]
          status = partitions.first["status"]
          count = partitions.count
          puts "[#{Time.now.utc}] #{consumer_group} #{topic} #{status} count: #{count}"
          tags = [
            "topic:#{topic}",
            "consumer_group:#{consumer_group}",
            "status:#{status}"
          ]
          datadog.emit_point("kafka_partition_status", count, tags: tags)
        end
      end
    end
  end

Stack Trace:

2017-05-02T00:21:52.818985351Z /usr/local/lib/ruby/2.2.0/net/http.rb:879:in `initialize': execution expired (Net::OpenTimeout)
2017-05-02T00:21:52.819033546Z 	from /usr/local/lib/ruby/2.2.0/net/http.rb:879:in `open'
2017-05-02T00:21:52.819041357Z 	from /usr/local/lib/ruby/2.2.0/net/http.rb:879:in `block in connect'
2017-05-02T00:21:52.819047458Z 	from /usr/local/lib/ruby/2.2.0/timeout.rb:88:in `block in timeout'
2017-05-02T00:21:52.819053696Z 	from /usr/local/lib/ruby/2.2.0/timeout.rb:98:in `call'
2017-05-02T00:21:52.819058329Z 	from /usr/local/lib/ruby/2.2.0/timeout.rb:98:in `timeout'
2017-05-02T00:21:52.819062870Z 	from /usr/local/lib/ruby/2.2.0/net/http.rb:878:in `connect'
2017-05-02T00:21:52.819067620Z 	from /usr/local/lib/ruby/2.2.0/net/http.rb:863:in `do_start'
2017-05-02T00:21:52.819072589Z 	from /usr/local/lib/ruby/2.2.0/net/http.rb:852:in `start'
2017-05-02T00:21:52.819077432Z 	from /usr/local/bundle/gems/dogapi-1.27.0/lib/dogapi/common.rb:97:in `connect'
2017-05-02T00:21:52.819082421Z 	from /usr/local/bundle/gems/dogapi-1.27.0/lib/dogapi/common.rb:117:in `request'
2017-05-02T00:21:52.819089117Z 	from /usr/local/bundle/gems/dogapi-1.27.0/lib/dogapi/v1/metric.rb:24:in `upload'
2017-05-02T00:21:52.819094251Z 	from /usr/local/bundle/gems/dogapi-1.27.0/lib/dogapi/v1/metric.rb:29:in `submit_to_api'
2017-05-02T00:21:52.819098836Z 	from /usr/local/bundle/gems/dogapi-1.27.0/lib/dogapi/v1/metric.rb:48:in `submit'
2017-05-02T00:21:52.819103634Z 	from /usr/local/bundle/gems/dogapi-1.27.0/lib/dogapi/facade.rb:88:in `emit_points'
2017-05-02T00:21:52.819120373Z 	from /usr/local/bundle/gems/dogapi-1.27.0/lib/dogapi/facade.rb:61:in `emit_point'
2017-05-02T00:21:52.819125595Z 	from /app/lib/check.rb:78:in `block (3 levels) in check'
2017-05-02T00:21:52.819147208Z 	from /app/lib/check.rb:68:in `each'
2017-05-02T00:21:52.819151664Z 	from /app/lib/check.rb:68:in `block (2 levels) in check'
2017-05-02T00:21:52.819156007Z 	from /app/lib/check.rb:57:in `each'
2017-05-02T00:21:52.819160277Z 	from /app/lib/check.rb:57:in `block in check'
2017-05-02T00:21:52.819164581Z 	from /app/lib/check.rb:56:in `each'
2017-05-02T00:21:52.819169164Z 	from /app/lib/check.rb:56:in `check'
2017-05-02T00:21:52.819173658Z 	from /app/run:9:in `block (2 levels) in <main>'
2017-05-02T00:21:52.819178178Z 	from /app/run:8:in `loop'
2017-05-02T00:21:52.819182701Z 	from /app/run:8:in `block in <main>'
2017-05-02T00:21:52.819187217Z 	from /usr/local/bundle/gems/daemons-1.2.3/lib/daemons/application.rb:265:in `call'
2017-05-02T00:21:52.819192034Z 	from /usr/local/bundle/gems/daemons-1.2.3/lib/daemons/application.rb:265:in `block in start_proc'
2017-05-02T00:21:52.819196583Z 	from /usr/local/bundle/gems/daemons-1.2.3/lib/daemons/application.rb:274:in `call'
2017-05-02T00:21:52.819201133Z 	from /usr/local/bundle/gems/daemons-1.2.3/lib/daemons/application.rb:274:in `start_proc'
2017-05-02T00:21:52.819208989Z 	from /usr/local/bundle/gems/daemons-1.2.3/lib/daemons/application.rb:295:in `start'
2017-05-02T00:21:52.819213991Z 	from /usr/local/bundle/gems/daemons-1.2.3/lib/daemons/controller.rb:59:in `run'
2017-05-02T00:21:52.819218388Z 	from /usr/local/bundle/gems/daemons-1.2.3/lib/daemons.rb:193:in `block in run_proc'
2017-05-02T00:21:52.819222905Z 	from /usr/local/bundle/gems/daemons-1.2.3/lib/daemons/cmdline.rb:88:in `call'
2017-05-02T00:21:52.819227489Z 	from /usr/local/bundle/gems/daemons-1.2.3/lib/daemons/cmdline.rb:88:in `catch_exceptions'
2017-05-02T00:21:52.819232413Z 	from /usr/local/bundle/gems/daemons-1.2.3/lib/daemons.rb:192:in `run_proc'
2017-05-02T00:21:52.819236758Z 	from /app/run:6:in `<main>'

from dogapi-rb.

dgreene-r7 avatar dgreene-r7 commented on August 15, 2024

I'm seeing this as well. It's unclear as to whether this is a consequence of rate limiting or something else. Can anyone from DD look into this?

from dogapi-rb.

dpavlov-smartling avatar dpavlov-smartling commented on August 15, 2024

Can confirm this also. Error is absolutely the same as have been mentioned

/usr/share/ruby/2.0/net/http.rb:878:in `initialize': execution expired (Net::OpenTimeout)
	from /usr/share/ruby/2.0/net/http.rb:878:in `open'
	from /usr/share/ruby/2.0/net/http.rb:878:in `block in connect'
	from /usr/share/ruby/2.0/net/http.rb:877:in `connect'
	from /usr/share/ruby/2.0/net/http.rb:862:in `do_start'
	from /usr/share/ruby/2.0/net/http.rb:851:in `start'
	from /usr/local/share/ruby/gems/2.0/gems/dogapi-1.25.0/lib/dogapi/common.rb:97:in `connect'
	from /usr/local/share/ruby/gems/2.0/gems/dogapi-1.25.0/lib/dogapi/common.rb:117:in `request'
	from /usr/local/share/ruby/gems/2.0/gems/dogapi-1.25.0/lib/dogapi/v1/metric.rb:24:in `upload'
	from /usr/local/share/ruby/gems/2.0/gems/dogapi-1.25.0/lib/dogapi/v1/metric.rb:29:in `submit_to_api'
	from /usr/local/share/ruby/gems/2.0/gems/dogapi-1.25.0/lib/dogapi/v1/metric.rb:48:in `submit'
	from /usr/local/share/ruby/gems/2.0/gems/dogapi-1.25.0/lib/dogapi/facade.rb:88:in `emit_points'
	from /usr/local/share/ruby/gems/2.0/gems/dogapi-1.25.0/lib/dogapi/facade.rb:61:in `emit_point'

Also there is an open support case with number 95923. DD is there any update on this?

from dogapi-rb.

rmoriz avatar rmoriz commented on August 15, 2024

This issue is really annoying and we suffer from it on a daily basis. 😞

from dogapi-rb.

dgreene-r7 avatar dgreene-r7 commented on August 15, 2024

We put some sleep time between our API calls and stopped seeing this error. More and more this looks like rate limiting even though the DD rep I've been working with says that it isn't.

from dogapi-rb.

rmoriz avatar rmoriz commented on August 15, 2024

We've automated deployment processes and chef-runs so we can't put some "sleep" in our code…

from dogapi-rb.

dpavlov-smartling avatar dpavlov-smartling commented on August 15, 2024

@dgreene-r7 we are submitting up to 300 metrics and if we add even 1 second of sleep we will wait up to 5 minutes. This doesn't work in case you submitting large amount of metrics or in case provided by @rmoriz

from dogapi-rb.

dgreene-r7 avatar dgreene-r7 commented on August 15, 2024

I wasn't suggesting that was a solution in any way; simply that this timeout error might be DD throttling us without actually telling us they're doing it.

from dogapi-rb.

rmoriz avatar rmoriz commented on August 15, 2024

@dgreene-r7 sorry, my comment was not meant as critic to your post or workaround. I share your conclusion that DD has some rate-limiting deployed that breaks even regular use cases. DD needs to solve this ASAP.

from dogapi-rb.

truthbk avatar truthbk commented on August 15, 2024

I'm not the maintainer, but I took a quick look at this out of curiosity. The issue seems to be local to the host (nothing to do with API rate-limiting). This is the offending snippet: https://github.com/ruby/ruby/blob/v2_2_0/lib/net/http.rb#L877-L880

The issue is that dogapi-rb is attempting to open a new socket and for some reason, maybe in tight (or relatively tight) loop this is resulting in an OpenTimeout error. At first I thought maybe we had a leak that was making us run out of file descriptors, but that does not seem to be the problem here. Maybe just the overhead of connection creation/breakdown is eventually leading us to the TO situation.
I believe this is something that could be perhaps mitigated by maybe doing some recycling of the TCP connections and avoiding opening a new socket for every point emission to the API.

@degemer any thoughts on this?

from dogapi-rb.

truthbk avatar truthbk commented on August 15, 2024

So... as far as investigations go, we have made some progress in this. Although I'm not 100% sure of the actual root cause. The issue comes from the system level, something we've been able to corroborate via some strace profiling:

socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 7
connect(7, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "<redacted>", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ENETUNREACH (Network is unreachable)
connect(7, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(7, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "<redacted>", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ENETUNREACH (Network is unreachable)

Also notice we can see healthy connections being established via IPv4 in the same strace:

socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 7
connect(7, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("<redacted>")}, 16) = 0
getsockname(7, {sa_family=AF_INET, sin_port=htons(37080), sin_addr=inet_addr("<redacted>")}, [16]) = 0

So we know the issue actually comes from the OS-level. Now, the reason why the network is unreachable is what's unclear to me at this point. My most "solid" theory is a DNS query is being made to resolve a hostname, but we're getting an IPv6 response (notice the socket being created is an IPv6/UDP socket). We're then using that to setup the connection, etc - but we may be on a IPv4-only network?!? However, there are no getaddrinfo() calls in the strace which would typically signal DNS resolution - so hard to verify this theory. Alternatively, users may be just using an ipv6 address on an (unknowingly) ipv4-only network(?).

If anyone has any other ideas please let me know. So... this leads me to think that either:
a) The library is entirely exonerated from any fault - the problem being elsewhere (ie. the DNS resolver).
b) The library is not handling the case where we may be getting several DNS records (ipv6 and ipv4)
c) Some other awkward edge case.

from dogapi-rb.

rmoriz avatar rmoriz commented on August 15, 2024

Our systems are deployed with a dual-stack setup and also by default to prefer IPv6 connections. app.datadoghq.com resolves to IPv6 ELB so probably our attempts will be made over IPv6 all the time.

from dogapi-rb.

OPhamster avatar OPhamster commented on August 15, 2024

I'm getting the same error - albeit in our scheduled containers - ie everytime the container spins up - it reports metrics and then shutsdown - hence we can't exactly recycle TCP connections. Anyway we can sidestep this issue for now ?

from dogapi-rb.

cloud-overlord avatar cloud-overlord commented on August 15, 2024

Same error

from dogapi-rb.

masci avatar masci commented on August 15, 2024

For the original question, the presence of a running Agent shouldn't affect in any way the library, I suspect that might be unrelated.

I wasn't able to reproduce myself but from the investigation @truthbk conducted the problem seems to be at the OS level, likely determined by a specific configuration. That said, if we can provide a workaround at the library level I'd be happy to ship it but I need a failing snippet of code and an environment I can reproduce to work on a fix.

from dogapi-rb.

alexdesi avatar alexdesi commented on August 15, 2024

Same error,
I daily get from 2 to 10 (circa) of the Net::OpenTimeout exception by calling emit_point
Any news about this?

from dogapi-rb.

github-actions avatar github-actions commented on August 15, 2024

Thanks for your contribution!

This issue has been automatically marked as stale because it has not had activity in the last 30 days. Note that the issue will not be automatically closed, but this notification will remind us to investigate why there's been inactivity. Thank you for participating in the Datadog open source community.

If you would like this issue to remain open:

  1. Verify that you can still reproduce the issue in the latest version of this project.

  2. Comment that the issue is still reproducible and include updated details requested in the issue template.

from dogapi-rb.

gzussa avatar gzussa commented on August 15, 2024

Leaving this as closed since we can't reproduce the error hence fix (or at least try to fix) the problem at the lib level. Reference to @masci comment.

from dogapi-rb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.