Deion This issue is a call for a design of zero-copy intra-p

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

I initially posted this as an topic on answers.ros.org (see <a href="https://answers.r

I initially posted this as an topic on answers.ros.org (see <a href="http

Intra-Process Communications for all language clients about design HOT 15 OPEN

ros2 commented on June 11, 2024 8

Intra-Process Communications for all language clients

from design.

Comments (15)

ivanpauno commented on June 11, 2024

I think that we can take some ideas from Connext "zero copy transfer over shared memory":

That's actually interprocess communication over shared memory, but something similar can be replicated using a buffer instead of a piece of shared memory.

The basic idea is that you have to ask to the publisher for a new message, instead of allocating an unique_ptr:

msg = publisher->get_new_message();
if (msg != nullptr) {
  msg->data = "asd";
  publisher->publish(msg);
}

Currently, message lifetime can be extended to be longer than the scope of the callback (in cpp). That would not be possible if we go ahead with something like this (or at least, it will be really hard to implement that feature).

The implementation could live in rcl or rmw, I'm not sure what would be better.

from design.

allenh1 commented on June 11, 2024

@ivanpauno I don't think publisher->get_new_message() ever return nullptr. I'd prefer a more asynchronous way to fetch a message, or potentially blocking on that call instead. I'm not very fond of the blocking call idea, but maybe an asynchronous trigger could be set up?

Maybe it could be set up so that we can std::invoke a callback in the publish() function? This isn't great though, since this would need to be done in rcl, which means it would be wasting cycles checking if there are std::binded callbacks on non-shared memory platforms.

I'm not seeing a way to make this happen in anything above rmw, except of course when there are multiple nodes inside the same process.

Sorry for the rambles, very interested in this idea.

from design.

fujitatomoya commented on June 11, 2024

just sharing my thought,

The implementation could live in rcl or rmw, I'm not sure what would be better.

i believe that it is better to be implemented in rmw, not rcl.

it sounds rmw responsibility to take care of transportation. (rmw)
provide consistent/compatible API to frontend, concealed by rmw.
taking advantage/comparison of each rmw implementation.

from design.

emersonknapp commented on June 11, 2024

Collecting some relevant parts of the previous discussion here for easier review, and to feed the design:

Re: location of implementation @gbiggs wrote

This is a tangential comment, but I wonder if we could achieve the same zero-copies-when-same-process result by reducing the number of copies requires for going into and out of the rmw layer to zero and using a DDS implementation that also supports zero copies (ignoring that there may not be any and that the standard API may not support this, both of which are solvable issues). One of the reasons for using DDS is to push all the communication issues down into an expert-vendor-supplied library, after all.

Re: location of implementation @raghaprasad wrote

How about moving the intra_process_management into an rmw ?
This rmw could handle only intra_process communication and delegate inter-process communication to a any of the chosen DDS rmw implementations.

Support for zero copies is an important objective, but its not the only one. It has been observed that creating DDS participants is pretty resource heavy in terms of net memory required (atleast for FastRTPS & OpenSplice) and the discovery process is CPU intensive (due to multicast).
This new rmw could drastically simplify the discovery process and most certainly reduce the memory footprint by needing only one participant per process to support inter_process communication.

Re: smart-ptr messages @gbiggs wrote

But it is possible to do the rmw and rcl APIs and implementations such that they manage their raw pointers properly and provide a smart_ptr interface-compatible object in rclcpp. I'm not saying it would be easy, but this is how the STL is designed to be used and it would be the most powerful solution.

Re: implementation @ivanpauno wrote

I would like to see something mimicking connext Zero Copy Transfer Over Shared Memory semantics (by default connext use shared memory, but it doesn't use zero copy transfer, which have an specific semantics). Basically, instead of creating a unique pointer and then publishing it:
auto msg = std::make_unique<MSG_TYPE>();
/* Fill the message here */
publisher->publish(std::move(msg))
You ask to the publisher a piece of memory, fill it, and then publish:
auto msg = publisher->new_message();
/* Fill the message here */
publisher->publish(std::move(msg)); // I'm using move semantics because the message will be undefined after calling publish. But how we wrap the msg for this is an implementation detail.
For dds vendors that have implemented zero copy transport, this could just wrap it.
For others, we could have a default implementation that's used in those cases. That implementation could not use shared memory that allows INTERprocess zero copy transport, but just use a preallocated buffer in each publisher that allows INTRAprocess zero copy transport. This implementation is a good start for later doing something like this (if we want to do it).

I also think this idea will look idiomatic in other languages (for example, in python), and performance should be quite similar.

from design.

emersonknapp commented on June 11, 2024

A question: do we want to have intra-process communication always optimized in ROS2, regardless of choice of RMW?

If yes we want it always available, what about this idea?

an independent full implementation of the RMW API - rmw_intraprocess
instantiate both
- rmw_intraprocess for use by nodes within the same process
- The cross-process rmw implementation chosen via environment
have rcl or rmw layer route API calls to the appropriate of the two co-existing RMWs based on whether the communication is within the process

Or, this is a possible outcome, should we just expect that intraprocess communications should be the job of the choice of RMW implementation, and just push development to add this to our RMW impl of choice, e.g. FastRTPS or CycloneDDS or wherever?

from design.

dirk-thomas commented on June 11, 2024

How about moving the intra_process_management into an rmw ?
This rmw could handle only intra_process communication and delegate inter-process communication to a any of the chosen DDS rmw implementations.

Support for zero copies is an important objective, but its not the only one. It has been observed that creating DDS participants is pretty resource heavy in terms of net memory required (atleast for FastRTPS & OpenSplice) and the discovery process is CPU intensive (due to multicast).
This new rmw could drastically simplify the discovery process and most certainly reduce the memory footprint by needing only one participant per process to support inter_process communication.

The overhead described here is addressed by the proposal in #250 and isn't related to intra process communication. Even with intra process communication every node / participant has to perform discovery and comes with that overhead.

from design.

ivanpauno commented on June 11, 2024

@ivanpauno I don't think publisher->get_new_message() ever return nullptr. I'd prefer a more asynchronous way to fetch a message, or potentially blocking on that call instead. I'm not very fond of the blocking call idea, but maybe an asynchronous trigger could be set up?

I guess that it's possible to not return ever nullptr (probably with locking behavior), I just added it because I'm not super sure about how the implementation would be.

i believe that it is better to be implemented in rmw, not rcl.

it sounds rmw responsibility to take care of transportation. (rmw)

provide consistent/compatible API to frontend, concealed by rmw.

taking advantage/comparison of each rmw implementation.

I agree, specially with the first and last points.
Each time I think about the intraprocess communication problem, I'm more convinced that it's a problem that should be addressed by the underlying middleware (FastRTPS, Connext, OpenSplice, etc), and we should only wrap their zero copy transfer API. Of course, that's probably out of our scope and we have to provide a solution on top of the middleware. But that have the cost of re-implementing a lot of things (supporting a lot of different QoS features, etc).

Or, this is a possible outcome, should we just expect that intraprocess communications should be the job of the choice of RMW implementation, and just push development to add this to our RMW impl of choice, e.g. FastRTPS or CycloneDDS or wherever?

👍

from design.

qootec commented on June 11, 2024

I initially posted this as an topic on answers.ros.org (see https://answers.ros.org/question/333180/ros2-micro-ros-intra-process/) but was advised by the moderator to move it to discourse... I think the core of my concern touches your discussion.

(My context: ROS2 inside a machine controller)

Looking at your proposals for intra-process communication, I fail to see whether you also take into account the multi-priority requirements such (often embedded) environments typically have.

I currently see fragmented solution elements or approaches:

From Micro-ROS: Multiple executors could be hosted in the same process/node, each having their own queue for messages (or in fact their handlers) of the corresponding priority (based on their handlers' callbackgroup priority).
From ROS2: ROS2 does not create its own queuing mechanism, but instead relies on the queues already available in the DDS middleware.
From ROS2 (close to this topic): use_intra_process_comms() … if true, messages will go through a special intra-process communication code path. So potentially excluding DDS. Then how will they get queued / priority managed?
(RTI) DDS has a Transport_Priority_QoS defined per DataWriter, which is then to be kept in sync with the cbGroup priority?

Is there any documented vision on how your intra-process-communication would co-exist with multi-priority queuing/handling?

Johan

from design.

gavanderhoorn commented on June 11, 2024

I initially posted this as an topic on answers.ros.org (see https://answers.ros.org/question/333180/ros2-micro-ros-intra-process/) but was advised by the moderator to move it to discourse...

I did, but this is not the embedded category on ROS Discourse.

from design.

atyshka commented on June 11, 2024

Any updates on this roughly a year later?

from design.

ivanpauno commented on June 11, 2024

Any updates on this roughly a year later?

Not that I know of.
The problem isn't trivial, and AFAIK there is no people assigned to work on it.

from design.

twaddellberkeley commented on June 11, 2024

Hi @ivanpauno, is there any work on this problem, if not do you need help? Would love to dive into it.

Cheers

from design.

ivanpauno commented on June 11, 2024

AFAIK, nobody is working on this right now.
I'm not sure if there's a plan to work on the topic soon.

from design.

emersonknapp commented on June 11, 2024

I'm not sure, but does the Cyclone+iceoryx combo do this automatically for C++ nodes in the same process?

from design.

ivanpauno commented on June 11, 2024

I'm not sure, but does the Cyclone+iceoryx combo do this automatically for C++ nodes in the same process?

Not zero copy, zero-copy requires a different API.

from design.

Intra-Process Communications for all language clients about design HOT 15 OPEN

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent