Coder Social home page Coder Social logo

Comments (6)

eboasson avatar eboasson commented on August 17, 2024

Curious 🙂

I suspect it is this:

std::vector<char> tmp(128);

and that the condition for retrying is simply wrong. Blame it on my relationship to C++.

Some "TLC" did happen and dds_get_name these days returns the actual length so if at first the buffer is too small, one can simply retry with a buffer that you know to be large enough instead of this kludge.

I don't currently have a build of ROS 2 at hand and won't be able to do a pull request today (the direct result of macOS no longer being tier 1 ...) but if you happen to have everything set up for a quick experiment, you might want to try increasing the 128 in the referenced line.

from rmw_cyclonedds.

yamokosk avatar yamokosk commented on August 17, 2024

Curious 🙂

I suspect it is this:

std::vector<char> tmp(128);

and that the condition for retrying is simply wrong. Blame it on my relationship to C++.
Some "TLC" did happen and dds_get_name these days returns the actual length so if at first the buffer is too small, one can simply retry with a buffer that you know to be large enough instead of this kludge.

I don't currently have a build of ROS 2 at hand and won't be able to do a pull request today (the direct result of macOS no longer being tier 1 ...) but if you happen to have everything set up for a quick experiment, you might want to try increasing the 128 in the referenced line.

I don't think its this. 128 characters doesn't line up with the 99 character limit reported.

Grep'ed the cyclone code base (https://github.com/eclipse-cyclonedds/cyclonedds) and that shows char topic_name[100] is all over the place. Based on the reported character count, this seems a more likely culprit. Yay, magic numbers!

from rmw_cyclonedds.

pauldinh avatar pauldinh commented on August 17, 2024

but if you happen to have everything set up for a quick experiment, you might want to try increasing the 128 in the referenced line.

Yeah this looks like the culprit. I doubled the vector to 255 and I haven't been able to reproduce the issue. I also tested bumping it up by 1 (to 129) which resulted in 99 no longer timing out like it used to:

[INFO] [1648687538.394413767] [fibonacci_action_client_97]: client.wait_for_server() timed out? False
[INFO] [1648687538.394787646] [fibonacci_action_client_98]: client.wait_for_server() timed out? False
[INFO] [1648687538.395153215] [fibonacci_action_client_99]: client.wait_for_server() timed out? False
[WARN] [1648687538.395611024] [fibonacci_action_client_100]: client.wait_for_server() timed out? True

Earlier today, after I filed the ticket, I observed this happening with services too:

  • wait_for_service() would block indefinitely
  • wait_for_service(timeout_sec=n) to allow it to continue, ignore the timeout, and call() / call_async() would go through like normal
  • only difference was the character limit was ~117

And after doubling that line to 255 this also went away.

from rmw_cyclonedds.

pauldinh avatar pauldinh commented on August 17, 2024

Grep'ed the cyclone code base (https://github.com/eclipse-cyclonedds/cyclonedds) and that shows char topic_name[100] is all over the place. Based on the reported character count, this seems a more likely culprit. Yay, magic numbers!

All of the char topic_name[100] hits are unit test related so it's possible that's just an arbitrary number chosen for the tests. An unlucky coincidence if so.

from rmw_cyclonedds.

eboasson avatar eboasson commented on August 17, 2024

And after doubling that line to 255 this also went away.

Thanks for confirming!

All of the char topic_name[100] hits are unit test related so it's possible that's just an arbitrary number chosen for the tests. An unlucky coincidence if so

Quite so.

The discrepancy between the service name length and the buffer size is caused by the name mangling in ROS. Probably if you'd shorten your name space with a few characters the limit would have gone up to just above 100. It is just that that experiment would at best increase confusion.

So just a simple bug in the RMW layer. (To be honest, this upsets me a lot less than one in Cyclone proper, even though I feel a bit stupid for getting this loop wrong ... 🙂)

from rmw_cyclonedds.

pauldinh avatar pauldinh commented on August 17, 2024

Thanks for the quick turnaround!

from rmw_cyclonedds.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.