Implementation of the ROS Middleware (rmw) Interface using RTI's Connext DDS.
DEPRECATION NOTICE: rmw_connextdds
is a new RMW implementation for RTI Connext DDS, which supersedes the one
contained in this repository (rmw_connext_cpp
). This new implementation was
developed by RTI in collaboration with the ROS 2 community, and it resolves
several performance issues that are present in this implementation.
rmw_connextdds
is included in ROS 2 releases starting with Galactic.
This rmw implementation will be supported until the end-of-life of the ROS distributions it is available in (ROS 2 Dashing and Foxy).
To use rmw_connext with ROS2 applications, set the environment variable RMW_IMPLEMENTATION=rmw_connext_cpp
and run your ROS2 applications as usual:
Linux:
export RMW_IMPLEMENTATION=rmw_connext_cpp
or prepend on ROS2 command line, such as:
RMW_IMPLEMENTATION=rmw_connext_cpp ros2 run rviz2 rviz2
Windows:
set RMW_IMPLEMENTATION=rmw_connext_cpp
Pre-built binaries for RTI Connext DDS are available for x86_64 (Debian/Ubuntu) Linux platform using the steps outlined in the ROS2 installation wiki, available under a non-commercial license.
Other platforms must be built from source, using a separately-installed copy of RTI Connext DDS.
This implementation of rmw_connext requires version 5.3.1 of RTI Connext DDS, which can be obtained through the RTI University Program, purchase, or as an evaluation. Note that the RTI website has Free Trial offers, but these are typically for the most-current version of RTI Connext DDS (6.0.1 as of this writing), which does not build with this implementation of rmw_connext.
Refer to the Install DDS Implementations page for details on building rmw_connext for your platform.
QoS profiles can be specified in XML according to the load order specified here. url_profile
and string_profile
cannot be used.
ROS will use the profile with the is_default_qos="true"
attribute.
The policies defined in the ROS QoS profile will override those in the default profile, except when rmw_qos_profile_system_default
is used.
For example:
<?xml version="1.0"?>
<dds xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://community.rti.com/schema/5.3.1/rti_dds_qos_profiles.xsd" version="5.3.1">
<qos_library name="Ros2TestQosLibrary">
<qos_profile name="Ros2TestDefaultQos" base_name="BuiltinQosLib::Baseline.5.3.0" is_default_qos="true">
<participant_qos>
<property>
<value>
<!-- 6.25 MB/sec (52 Mb/sec) flow controller -->
<element>
<name>dds.flow_controller.token_bucket.slow_flow.token_bucket.max_tokens</name>
<value>8</value>
</element>
<element>
<name>dds.flow_controller.token_bucket.slow_flow.token_bucket.tokens_added_per_period</name>
<value>8</value>
</element>
<element>
<name>dds.flow_controller.token_bucket.slow_flow.token_bucket.bytes_per_token</name>
<value>8192</value>
</element>
<element>
<name>dds.flow_controller.token_bucket.slow_flow.token_bucket.period.sec</name>
<value>0</value>
</element>
<element>
<name>dds.flow_controller.token_bucket.slow_flow.token_bucket.period.nanosec</name>
<value>10000000</value>
</element>
</value>
</property>
</participant_qos>
<datawriter_qos topic_filter="rt/my_large_data_topic">
<reliability>
<kind>RELIABLE_RELIABILITY_QOS</kind>
</reliability>
<publish_mode>
<flow_controller_name>dds.flow_controller.token_bucket.slow_flow</flow_controller_name>
</publish_mode>
</datawriter_qos>
</qos_profile>
</qos_library>
</dds>
That will force all publishers in the my_large_data_topic
to use the slow_flow
flow controller, but the reliability specified in the ROS QoS profile will be used except if its value is RMW_QOS_RELIABILITY_POLICY_SYSTEM_DEFAULT
.
See Topic Name Mangling
section to understand the rt/
prefix.
See RTI Connext docs to understand topic filters.
To use this feature, you must first set the following environment variable:
:: Windows
set RMW_CONNEXT_ALLOW_TOPIC_QOS_PROFILES=1
# Linux/MacOS
export RMW_CONNEXT_ALLOW_TOPIC_QOS_PROFILES=1
If the environment variable is set, when a profile name matches the dds topic name, it will be used and the ROS specified profile will be ignored.
For example:
<?xml version="1.0"?>
<dds xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://community.rti.com/schema/5.3.1/rti_dds_qos_profiles.xsd" version="5.3.1">
<qos_library name="Ros2TestQosLibrary">
<qos_profile name="Ros2TestDefaultQos" base_name="BuiltinQosLib::Baseline.5.3.0" is_default_qos="true">
<participant_qos>
<property>
<value>
<!-- 6.25 MB/sec (52 Mb/sec) flow controller -->
<element>
<name>dds.flow_controller.token_bucket.slow_flow.token_bucket.max_tokens</name>
<value>8</value>
</element>
<element>
<name>dds.flow_controller.token_bucket.slow_flow.token_bucket.tokens_added_per_period</name>
<value>8</value>
</element>
<element>
<name>dds.flow_controller.token_bucket.slow_flow.token_bucket.bytes_per_token</name>
<value>8192</value>
</element>
<element>
<name>dds.flow_controller.token_bucket.slow_flow.token_bucket.period.sec</name>
<value>0</value>
</element>
<element>
<name>dds.flow_controller.token_bucket.slow_flow.token_bucket.period.nanosec</name>
<value>10000000</value>
</element>
</value>
</property>
</participant_qos>
</qos_profile>
<qos_profile name="rt/my_large_data_topic" base_name="BuiltinQosLib::Baseline.5.3.0">
<datawriter_qos> <!--Don't use topic filters here-->
<publish_mode>
<flow_controller_name>dds.flow_controller.token_bucket.slow_flow</flow_controller_name>
</publish_mode>
<reliability>
<kind>RELIABLE_RELIABILITY_QOS</kind>
</reliability>
</datawriter_qos>
</qos_profile>
</qos_library>
</dds>
In this case, all publishers in the topic /my_large_data_topic
will use the specified slow flow controller and have a reliable reliability (regardless of the reliability specified in ROS code).
Caveats:
- If you want to override the QoS profiles used for all publishers in a topic, the subscription profiles in the same topic will also be overriden. If you don't explicitly provide one, a default will be used.
- RTI Connext will log an error each time that it tries to find a profile that doesn't exist.
Your will see a lot of these logs in your terminal when using
RMW_CONNEXT_ALLOW_TOPIC_QOS_PROFILES
option.
If you only provided one QoS library to the process, that one will be used.
If not, the RMW_CONNEXT_QOS_PROFILE_LIBRARY
must be used:
:: Windows
set RMW_CONNEXT_QOS_PROFILE_LIBRARY=Ros2TestQosLibrary
# Linux/MacOS
export RMW_CONNEXT_QOS_PROFILE_LIBRARY=Ros2TestQosLibrary
You can use the RMW_CONNEXT_DEFAULT_QOS_PROFILE
environment variable for this.
It overrides the profile marked with is_default_qos="true"
when set.
The profile is looked up in the QoS profile library RMW connext is using.
ROS is always overriding the QoS profile of datawriters to use ASYNCHRONOUS_PUBLISH_MODE_QOS
.
To avoid that from being overriden, you can set the following environment variable:
:: Windows
set RMW_CONNEXT_DO_NOT_OVERRIDE_PUBLICATION_MODE=1
# Linux/MacOS
export RMW_CONNEXT_DO_NOT_OVERRIDE_PUBLICATION_MODE=1
ROS uses the following mangled topics when the ROS QoS policy avoid_ros_namespace_conventions
is false
, which is the default:
- Topics are prefixed with
rt
. e.g.:/my/fully/qualified/ros/topic
is converted tort/my/fully/qualified/ros/topic
. - The service request topics are prefixed with
rq
and suffixed withRequest
. e.g.:/my/fully/qualified/ros/service
request topic isrq/my/fully/qualified/ros/serviceRequest
. - The service response topics are prefixed with
rr
and suffixed withResponse
. e.g.:/my/fully/qualified/ros/service
response topic isrr/my/fully/qualified/ros/serviceResponse
.
Quality Declaration (per REP-2004)
See RTI Quality Declaration file, hosted on RTI Community website.
rmw_connext's People
Forkers
j-rivero dhood krbeverx qianqian121 gaoethan dejanpan sriramster tony1213 captaintrunky ruffsl thomas-moulard ross-desmond jwillemsen gonzodepedro aws-ros-dev mikaelarguedas evanliuav sgermanserrano roverrobotics-forks brawner kyrofa canonical zhang197652 jayhou fujitatomoya neil-rti irobot-ros iuhilnehc-ynos ericsson asorbini tienhoangvanrmw_connext's Issues
race condition in graph changes and service is available
I noticed this when debugging the flaky test in rcl
called test_rcl_service_server_is_available
which is in the rcl/test/rcl/test_graph.cpp
file:
The race seems to be between the graph guard condition being triggered (and waiting wait sets being woken up):
And the rcl_service_server_is_available
function reporting that a service that was previously available is no longer available:
Normally the test only checks this when a change occurs in the graph, but this caused this test to fail with connext periodically. So I added a condition for connext where it will check on each loop regardless of whether or not a graph change was detected:
The rcl_service_server_is_available
function normally reported the right state on the next loop. This special case for connext should be removed after this is fixed.
This could be caused by graph changes getting combined through some sort of coalescing of events or it could be a delay introduced by connext, I'm not sure yet. I've decided to work around and document the issue rather than solve it now.
[connext_dynamic_cpp] Can't send/receive StaticArrayNested using Python / C typesupport
This has always been the case and is not related to the recent breakage.
Test blacklisted here
Unused WIN32 compiler definiton
Bug report
Required Info:
- Operating System: Windows 10
- Installation type: from source
- Version or commit hash: 188147a
- DDS implementation: RTI Connext
- Client library (if applicable): N/A
Steps to reproduce issue
Expected behavior
rmw_connext_cpp/CMakeLists.txt, lines 116 and 122 should contain RMW_CONNEXT_CPP_BUILDING_DLL definition.
Actual behavior
The lines use ROSIDL_TYPESUPPORT_CONNEXT_CPP_BUILDING_DLL definition (copy paste error?)
"wrong type writer" error if topics with same token have different types
I have a node which subscribes to topic data
and publishes reception rate data on topic reception_rate/data
. Those topics have different types.
I get a TDataWriter::narrow:ERROR: Bad parameter: wrong type writer
error when I try to publish, but only if it's publishing to */data
. If it publishes to reception_rate/data_
there's no error. I suspect that the typesupport is being mixed up with the other topic's.
Here's how to reproduce it with two publishers in the same node:
import sys
from time import sleep
import rclpy
from std_msgs.msg import Int64, String
def main(args=None):
if args is None:
args = sys.argv
rclpy.init(args=args)
node = rclpy.create_node('talker')
chatter_pub = node.create_publisher(String, 'chatter')
chatter_pub2 = node.create_publisher(Int64, 'test/chatter')
msg = String()
msg2 = Int64()
i = 1
while True:
msg.data = 'Hello World: {0}'.format(i)
msg2.data = i
i += 1
print('Publishing: "{0}"'.format(msg.data))
chatter_pub.publish(msg)
chatter_pub2.publish(msg2)
sleep(1)
if __name__ == '__main__':
main()
Behaviour with fastrtps (matches expected behaviour):
$ RMW_IMPLEMENTATION=rmw_fastrtps_cpp ros2 run demo_nodes_py talker
Publishing: "Hello World: 1"
Publishing: "Hello World: 2"
Publishing: "Hello World: 3"
Publishing: "Hello World: 4"
$ ros2 topic list --show-types
/chatter [std_msgs/String]
/test/chatter [std_msgs/Int64]
Behaviour with connext (fresh daemon):
$ RMW_IMPLEMENTATION=rmw_connext_cpp ros2 run demo_nodes_py talker
RTI Data Distribution Service EVAL License issued to OSRF [email protected] For non-production use only.
Expires on 16-Jul-2017 See www.rti.com for more information.
Publishing: "Hello World: 1"
TDataWriter::narrow:ERROR: Bad parameter: wrong type writer
$ ros2 topic list --show-types
/chatter [std_msgs/String]
/test/chatter [std_msgs/String]
That last output showing /test/chatter
having type std_msgs/String
is suspicious.
[connext_dynamic] currently broken
As a regression of #194 the dynamic rmw implementation doesn't function anymore. Even for simple pub/sub examples the executables crash in the new add_information
calls.
wait_for_service not being woken by graph events
While investigating the appropriate timeout for ros2/system_tests#259, I noticed a correlation between the timeout used in wait_for_service
calls (20s) and the time taken for tests to run successfully.
The tests that have two wait_for_service
calls with timeouts of 20s each take one of 6, 26, or 46 seconds to run. Change the wait_for_service
call to each be 30s each and the tests take one of 6, 36, or 66 seconds to run. Change the wait_for_service
call to be multiple 1s wait_for_service
calls and the tests never take longer than 9s.
Note that the tests still pass, they just spend an unnecessary amount of time in the wait_for_service
calls, presumably because the waitset is not triggered by any graph event of the service coming up.
Given that wait_for_service
passes in the end, my money is on the graph event triggering before we wait on the waitset. Therefore we are waiting for something that has already occurred.
We have come across this in rmw_fastrtps_cpp
before: what we need is an equivalent to ros2/rmw_fastrtps#147, which prevents guard conditions from being triggered between the time we check them to decide if we should wait, and the time we actually wait.
This seems related to #201 but distinct in that this is a race condition in services showing up as opposed to #201 being a race condition in services going away.
[connext_dynamic] finish support for wait_for_service
Currently local changes to publishers and subscriptions are reflected in Connext dynamic, but not local changes to service clients and service servers, and so rmw_service_server_is_available()
is still disabled for Connext dynamic (but not for Connext static):
rmw_connext/rmw_connext_dynamic_cpp/src/functions.cpp
Lines 2333 to 2335 in 8ad74fb
This can be considered follow on work from: #168
ros2 topic * commands can't always determine the type with Connext
Bug report
Required Info:
- Operating System:
- Ubuntu 16.04 AMD64
- Installation type:
- Source
- Version or commit hash:
- Release versions of crystal, from ros2/ros2#624
- DDS implementation:
- RTI Connext
- Client library (if applicable):
- rclpy-ish
Steps to reproduce issue
Terminal 1:
RMW_IMPLEMENTATION=rmw_connext_cpp ros2 run demo_nodes_cpp talker
Terminal 2:
RMW_IMPLEMENTATION=rmw_connext_cpp ros2 topic echo /chatter
Expected behavior
The data on the /chatter topic is printed.
Actual behavior
Around 20% of the time, the ros2 topic
command will hang for 3-5 seconds, then print:
Could not determine the type for the passed topic
Additional information
Re-running the relevant command usually (but not always) makes it start working. Running it a number of times in a row always makes it start running.
Also, I may have seen this with other RMW implementations (particularly Fast-RTPS), but it seems much more common on Connext (though I haven't yet ruled out other factors).
hing CI
Client can't be stopped with Ctrl-C
See ros2/rclcpp#95
add_two_ints_client__rmw_connext_cpp
can't be stopped with Ctrl-C- same for Connext dynamic
unbounded fields workaround does not work for non-primitive datatypes in srvs
I created rcl_interfaces with some basic nested service definitions: https://github.com/ros2/rcl_interfaces/tree/master/rcl_interfaces/srv
However it crashed on generation:
WARN com.rti.ndds.nddsgen.antlr.auto.IdlLexer ParamDescription_.idl line 33 preprocessor directive not supported. It will be ignored
WARN com.rti.ndds.nddsgen.antlr.auto.IdlLexer GetParamsRequest_.idl line 25 preprocessor directive not supported. It will be ignored
INFO com.rti.ndds.nddsgen.Main Done
Traceback (most recent call last):
File "/home/tfoote/work/ros2/parameters/install/lib/rosidl_typesupport_connext_cpp/rosidl_typesupport_connext_cpp", line 75, in <module>
sys.exit(main())
File "/home/tfoote/work/ros2/parameters/install/lib/rosidl_typesupport_connext_cpp/rosidl_typesupport_connext_cpp", line 61, in main
service_specs,
File "/home/tfoote/work/ros2/parameters/install/lib/python3.4/site-packages/rosidl_typesupport_connext_cpp/__init__.py", line 86, in generate_dds_connext_cpp
_modify(plugin_cxx_filename, unbounded_fields, _step_2_1_and_2_3_and_2_4)
File "/home/tfoote/work/ros2/parameters/install/lib/python3.4/site-packages/rosidl_typesupport_connext_cpp/__init__.py", line 94, in _modify
modified = callback(unbounded_fields, lines)
File "/home/tfoote/work/ros2/parameters/install/lib/python3.4/site-packages/rosidl_typesupport_connext_cpp/__init__.py", line 144, in _step_2_1_and_2_3_and_2_4
modified |= _step_2_3(unbounded_fields, lines)
File "/home/tfoote/work/ros2/parameters/install/lib/python3.4/site-packages/rosidl_typesupport_connext_cpp/__init__.py", line 188, in _step_2_3
dds_type = _get_dds_type(unbounded_fields, field_name)
File "/home/tfoote/work/ros2/parameters/install/lib/python3.4/site-packages/rosidl_typesupport_connext_cpp/__init__.py", line 250, in _get_dds_type
idl_type = MSG_TYPE_TO_IDL[field.type.type]
KeyError: 'ParamDescription'
With a little debugging I localized it to this workaround for unbounded arrays: https://github.com/ros2/rmw_connext/blob/master/rosidl_typesupport_connext_cpp/rosidl_typesupport_connext_cpp/__init__.py#L76-L86
It is using a strict dictionary of primitive types. It needs to be robust to unbounded non-primitive types.
There is a more complete method msg_type_to_idl that would probably work, but the logic needs to be updated to support the arbitrary datatypes, not just the list of primitives.
catch error before nullptr access segfault
Bug report
Required Info:
- Operating System: Windows 10
- Installation type: from source
- Version or commit hash: 188147a
- DDS implementation: RTI Connext
- Client library (if applicable): N/A
Steps to reproduce issue
Visual code inspection
Expected behavior
Return value of _DataReader::narrow call should be checked for nullptr
Actual behavior
rosidl_typesupport_connext_cpp/resource/msg__type_support.cpp.em(167): no check
rosidl_typesupport_connext_cpp/resource/msg__type_support.cpp.em(251): no check
rosidl_typesupport_connext_c/resource/msg__type_support_c.cpp.em(279): OK, see this commit
rosidl_typesupport_connext_c/resource/msg__type_support_c.cpp.em(439): no check
Additional information
In my previous company I have seen RTI Connext fails on this call. I highly recommend to test all returned pointers for NULL.
Feature request
Feature description
Implementation considerations
Please repeat this fix for all narrow calls:
9ff3275
Flaky test
The following test failed: http://54.183.26.131:8080/view/ros2/job/ros2_batch_ci_osx/97/testReport/(root)/test_publish_subscribe__rmw_opensplice_cpp/test_publish_subscribe/
But the same state passed in the other build: http://54.183.26.131:8080/view/ros2/job/ros2_batch_ci_osx/98/testReport/
Do something about name length limits
While working on ros2/rclcpp#233, it was discovered that connext can throw an error like this when trying to create a parameter service client:
PRESContentFilteredTopic_createFilterProperty:!copy content filtered property "filter expression" field: reached maximum length for content filter property (current length: 262, max. length: 256). Please consider increasing contentfilter_property_max_length parameter under participant's resource limits.
PRESContentFilteredTopic_associateReader:!copy sequence for content filtered property data
DDS_Subscriber_create_datareader_disabledI:ERROR: Failed to associate reader and content filtered topic
DDSDataReader_impl::create_disabledI:!create reader
DDSDataReader_impl::createI:!create reader
DDSDomainParticipant_impl::create_datareader:ERROR: Failed to create datareader
initialize:!create DataReader
connext::details::EntityUntypedImpl::initialize:!failed (see previous errors)
In this case, the node name was test_parameters_local_synchronous_repeated
and the error above was produced when trying to create the test_parameters_local_synchronous_repeated__get_parameter_types
client during the construction of a SyncParametersClient
, which in turn constructs an AsyncParametersClient
, which creates various clients. The service name is only 63 characters long, so presumably there's something else going on that's creating a filter expression that's 262 characters long.
For now, I'm working around the problem in the test by shortening the node name (to test_parameters_local_synch_repeated
).
I don't know much about the name handling in DDS generally or connext specifically, so I'm starting by flagging this as an issue. It could be that the fix is in system configuration, not code.
Make use of ndds_namespace_cpp.h
Feature request
Feature description
Make use of ndds_namespace_cpp.h and the DDS namespace
Implementation considerations
Currently all code within this project uses DDS_ as prefix for all DDS defined types but when using ndds_namespace_cpp.h instead of ndds_cpp.h all DDS defined types are in the DDS namespace. That would simplify the porting of rmw code between the various DDS vendors because only RTI has a DDS_ prefix as alternative mapping for cpp.
dynamic service server segfaults on Windows
The test_server
test in the examples package crashes on Windows with this:
Unhandled exception at 0x00007FFF11C84724 (rmw_connext_xtypes_dynamic_cpp.dll) in test_server__rmw_connext_xtypes_dynamic_cpp.exe: 0xC0000005: Access violation reading location 0x0000000000000000.
This null pointer access occurs at this line: https://github.com/ros2/rmw_connext/blob/master/rmw_connext_dynamic_cpp/src/functions.cpp#L976
Our guess is to why is that the structure being referenced here is a static const struct in a shared library which initializes by calling a static function on another library. This introduces an initialization race condition which causes an issue on Windows. Basically since the static member is not initialized when the other static member is initialized it produces an invalid structure.
So the fix seems to be to just move the initialization of the struct's members to the first access at run time.
I'll open a pr against rosidl_dds
.
CMake module is not working for MSVC 2015
The FindConnext.cmake was not detecting my home made installation of RTI Connext which mimic the 5.1 and 5.2 RTI official directory layout. I would propose the following patch:
diff --git a/connext_cmake_module/cmake/Modules/FindConnext.cmake b/connext_cmake_module/cmake/Modules/FindConnext.cmake
index 6180438..7fb3df5 100644
--- a/connext_cmake_module/cmake/Modules/FindConnext.cmake
+++ b/connext_cmake_module/cmake/Modules/FindConnext.cmake
@@ -166,27 +166,6 @@ if(NOT "${_NDDSHOME} " STREQUAL " ")
endif()
endwhile()
- if(_matched_VS2015)
- set(_i 0)
- while(TRUE)
- list(LENGTH _libs _length)
- if(NOT ${_i} LESS ${_length})
- break()
- endif()
- list(GET _libs ${_i} _lib)
- set(_match TRUE)
- string(FIND "${_lib}" "VS2015" _found)
- if(NOT ${_found} EQUAL -1)
- set(_match FALSE)
- endif()
- if(_match)
- math(EXPR _i "${_i} + 1")
- else()
- list(REMOVE_AT _libs ${_i})
- endif()
- endwhile()
- endif()
-
The patch removes an extra check for MSVC2015 which seems not needed to me since the check is already done in previous code. And it is particularly buggy in checking the presence VS2015 string. Removing the patch worked fine for me in my tests.
fix returning from rmw_wait while any samples have not been taken yet: For Services
Parallel to #62 but for serices.
blocking ros2/system_tests#14
Setup windows debug libraries in Connext_LIBRARIES (cmake module)
The Visual Studio build of RTI Connext generates release and debug libraries (same using a "d" postfix). The issues is not critical as far as you don't need to debug the RTI Connext libraries. And the workaround would be to modify the _excpected_library_base_names
in FindConnext.cmake
package adding a d
at the end.
I would say that the proper way of getting this defined in the Find cmake module would be to use:
set(Connext_LIBRARIES optimized nddsc debug nddscd
optimized ... debug ...)
Something similar is implemented in the FindwxWindows.cmake module
When I tried a quick test for this layout ament is generating the error:
ament_export_libraries() package 'rmw_connext_cpp' passes the build configuration keyword 'debug' as the last exported library
Could be that some parsing/code in the ament package does not support this way of defining the libraries? I need to investigate about it.
Testing for Beta2 coverage demo_nodes_cpp parameter tools are not working correctly on Windows
I am running the parameter nodes in demo_nodes_cpp and under connext they are not working correctly on Windows. They appear to be hanging. The events executables provide some output after the Ctrl-C but not the full equivalent of the fastrtps runs. I've waited 30+ seconds and it still responds quickly immediately after the Ctrl-C.
Output of events after Ctrl-C:
Async version:
Simple set and get just hangs:
For reference fastrtps running the same sample in the same workspace:
[rmw_connext_dynamic] "heap" corruption of the connext dynamic test_subscriber after receiving a message
The pub-sub test for connext dynamic on Windows fails.
Running the test_publisher
works, and so does running the test_subscriber
. But when run together the `test_subscriber crashes (on shutdown it looks like) after receiving the first message.
This is the error:
Unhandled exception at 0x00007FFF1ED30F20 (ntdll.dll) in test_subscriber__rmw_connext_dynamic_cpp.exe: 0xC0000374: A heap has been corrupted (parameters: 0x00007FFF1ED6DD40).
This is the back trace:
nddscpp.dll!DDSGuardCondition::`vector deleting destructor'(unsigned int) C++
> rmw_connext_dynamic_cpp.dll!rmw_destroy_guard_condition(rmw_guard_condition_t * guard_condition) Line 755 C++
test_subscriber__rmw_connext_dynamic_cpp.exe!rclcpp::executor::Executor::~Executor() Line 51 C++
test_subscriber__rmw_connext_dynamic_cpp.exe!rclcpp::executors::single_threaded_executor::SingleThreadedExecutor::~SingleThreadedExecutor() Line 45 C++
test_subscriber__rmw_connext_dynamic_cpp.exe!rclcpp::spin(std::shared_ptr<rclcpp::node::Node> & node_ptr) Line 77 C++
test_subscriber__rmw_connext_dynamic_cpp.exe!main(int argc, char * * argv) Line 41 C++
This looks like a double free. I'm looking into a way to fix it.
refactor rmw_connext_dynamic_cpp functions file
Connext Dynamic should be refactored similarly to rmw_connext_shared_cpp
and rmw_connext_cpp
Update the hook to call 5.2 VS2015
Simple fix to call the 5.2 instead of 5.1
--- a/connext_cmake_module/env_hook/connext.bat.in
+++ b/connext_cmake_module/env_hook/connext.bat.in
@@ -1,7 +1,7 @@
set "Connext_HOME=@Connext_HOME@"
:: Call RTI's env setup script, piping stdout to nul, since they have echo on.
-call "%Connext_HOME:/=\%\..\rti_set_env_5.1.0.bat" 1> nul
+call "%Connext_HOME:/=\%\..\rti_set_env_5.2.0.bat" 1> nul
:: Add the Connext_LIBRARY_DIR to the Path so .dll's can be found at runtime.
set "Connext_LIBRARY_DIR=@Connext_LIBRARY_DIR@"
refactor functions.cpp
Refactor rmw_connext_cpp's functions.cpp
into multiple files.
FindConnext much slower since 5.3.0 upgrade
Finding Connext was taking a few seconds and is now taking ~30sec since we upgraded to Connext 5.3.0.
The main reason seems to be the execution time of rtiddsgen_server
invoked from CMake.
See #253 (comment) for more details
get_datareader_qos should use DDSSubscriber and get_datawriter_qos should use DDSPublisher
Feature request
Feature description
The get_datawriter_qos and get_datareader_qos use a RTI Connext DDS API extension to retrieve the default datareader/datawriter QoS from the DDSDomainParticipant. According to the DDS specification the datareader QoS should be retrieved from DDSSubscriber and datawriter QoS shoiuld be retrieved from DDSPublisher.
Implementation considerations
By using the RTI extension this code can break in the future.
master does not compile on Windows
See this latest job: http://54.183.26.131:8080/job/ros2_batch_ci_windows/199/console
This the relevant error:
"C:\Jenkins\workspace\ros2_batch_ci_windows\workspace\build\userland\userland.sln" (default target) (1) ->
"C:\Jenkins\workspace\ros2_batch_ci_windows\workspace\build\userland\add_two_ints_server.vcxproj.metaproj" (default target) (7) ->
"C:\Jenkins\workspace\ros2_batch_ci_windows\workspace\build\userland\add_two_ints_server.vcxproj" (default target) (105) ->
(ClCompile target) ->
C:\Jenkins\workspace\ros2_batch_ci_windows\workspace\src\ros2\examples\userland\src\add_two_ints_server.cpp(50): error C2668: 'rclcpp::node::Node::create_service': ambiguous call to overloaded function [C:\Jenkins\workspace\ros2_batch_ci_windows\workspace\build\userland\add_two_ints_server.vcxproj]
It is repeated a few times. I guess this is something to do with the most recent changes. At first I thought it was the changes I was testing, but it turned out to be a problem on master too.
The commits tested are:
==> vcs log -l1 src
..................
[ ... ]
=== src\ros2\examples (git) ===
commit c7b5e7780170549dee4e36df394697d455f6ec03
Merge: 2472ad3 2a61abf
Author: Esteve Fernandez <[email protected]>
Date: Wed Apr 22 14:05:41 2015 -0700
Merge pull request #15 from ros2/request-header
Pass request header to callbacks
=== src\ros2\launch (git) ===
commit 47bb7510e0634acde3ac9048b41a525ae83cdc86
Author: Dirk Thomas <[email protected]>
Date: Mon Apr 20 12:21:45 2015 -0700
use waitpid when available
=== src\ros2\rcl (git) ===
commit a8962978a2f82059cef2097427e746e349a89d2a
Merge: 02db75d dd5f1de
Author: Dirk Thomas <[email protected]>
Date: Tue Apr 7 17:04:52 2015 -0700
Merge pull request #2 from ros2/code_style_uncrustify
code style only
=== src\ros2\rcl_interfaces (git) ===
commit 645e40ebbfb8e857e46545b419965109989d4929
Author: Esteve Fernandez <[email protected]>
Date: Wed Apr 22 17:24:18 2015 -0700
Renamed recurse to recursive
=== src\ros2\rclc (git) ===
commit a2b2292eb34f06bd931e1a70f84368e6c651cd3d
Merge: 9615893 9609603
Author: Dirk Thomas <[email protected]>
Date: Tue Apr 7 17:04:57 2015 -0700
Merge pull request #2 from ros2/code_style_uncrustify
code style only
=== src\ros2\rclcpp (git) ===
commit 8ad1f1f4c5b1ab5b152cc664c56f3c991eaaac4f
Merge: 1bf595d 6b6b94f
Author: Esteve Fernandez <[email protected]>
Date: Tue Apr 28 15:09:22 2015 -0700
Merge pull request #25 from ros2/spin-node-until-future-complete
Added spin_node_until_future_complete
=== src\ros2\rmw (git) ===
commit 89900d73c6ed11c67b8b10e16a9eece4e2265629
Merge: 8524e33 889dce1
Author: Dirk Thomas <[email protected]>
Date: Tue Apr 28 12:00:17 2015 -0700
Merge pull request #11 from ros2/typesupport_for_rmw_impl
export type support for rmw implementation
=== src\ros2\rmw_connext (git) ===
commit 1aece5657d69a34e8c808943105e3bb45a66c2ad
Author: Dirk Thomas <[email protected]>
Date: Tue Apr 28 12:10:09 2015 -0700
standardize target suffix
=== src\ros2\rmw_implementation (git) ===
commit d719244878056bc13b3c2381e97b1b618c4372d9
Author: Dirk Thomas <[email protected]>
Date: Fri Apr 3 12:30:43 2015 -0700
update license file to keep copyright template
=== src\ros2\rmw_opensplice (git) ===
commit 48e5f4f9dd32d49e2510b0ad99d5560f4fd2820a
Author: Dirk Thomas <[email protected]>
Date: Tue Apr 28 12:10:02 2015 -0700
standardize target suffix
=== src\ros2\rosidl (git) ===
commit 47cb8d2918c927956c556df959c4b46e89a3c57e
Author: Dirk Thomas <[email protected]>
Date: Tue Apr 28 12:10:43 2015 -0700
function to depend on include directories and libraries of generated interface target
=== src\ros2\rosidl_dds (git) ===
commit e749aecbcc49cc52d89ec0711b9a1af725a4b0eb
Author: Dirk Thomas <[email protected]>
Date: Tue Apr 28 12:09:54 2015 -0700
standardize target suffix
subscriptions, services and clients pointer arguments are not checked for NULL
Bug report
Required Info:
-
Operating System:
- all
-
Installation type:
- source
-
Version or commit hash:
-
DDS implementation:
- rmw_connext
- rmw_fastrtps
- rmw_opensplice
Steps to reproduce issue
I did not write a test case but I can if needed
Expected behavior
NA
Actual behavior
NA
Additional information
I will provide a PR.
Solution should be like this:
// add a condition for each subscriber
if (subscriptions) {
for (size_t i = 0; i < subscriptions->subscriber_count; ++i) {
OpenSpliceStaticSubscriberInfo *subscriber_info =
static_cast<OpenSpliceStaticSubscriberInfo *>(subscriptions->subscribers[i]);
if (!subscriber_info) {
RMW_SET_ERROR_MSG("subscriber info handle is null");
return RMW_RET_ERROR;
}
DDS::ReadCondition *read_condition = subscriber_info->read_condition;
if (!read_condition) {
RMW_SET_ERROR_MSG("read condition handle is null");
return RMW_RET_ERROR;
}
rmw_ret_t status = check_attach_condition_error(
dds_waitset->attach_condition(read_condition));
if (status != RMW_RET_OK) {
return status;
}
}
}
@serge-nikulin fyi
Performance issues with large data
I used this tool https://github.com/ApexAI/performance_test to compare performance of rmw_connext and rmw_fastrtps.
While performance seems ok with small data rmw_connext does not properly work with large data.
It fails at sending 50 4Mb samples per second while rmw_fastrtps can handle 500 samples per second.
This is not a problem with Connext Pro itself as I was using RTIs tool to verify proper performance( https://community.rti.com/downloads/rti-connext-dds-performance-test).
All tests were done using Bouncy following the instructions from here:
https://github.com/ros2/ros2/wiki/Linux-Development-Setup
Results for 4Mb PointClound @ 50 Hz:
Fastrtps Best Effort:
log_PointCloud4m_19-09-2018_16-05-02.pdf
Connext Pro Best Effort:
log_PointCloud4m_19-09-2018_14-30-57.pdf
Connext Pro Reliable:
log_PointCloud4m_19-09-2018_14-34-17.pdf
Results for 4Mb PointClound @ 500 Hz:
Fastrtps Best Effort:
log_PointCloud4m_19-09-2018_16-07-02.pdf
Connext Pro Best Effort:
log_PointCloud4m_19-09-2018_14-50-57.pdf
To run a full performance investigation which also reproduces the results here you can run
python src/performance_test/performance_test/helper_scripts/run_experiment.py
as described here: https://github.com/ApexAI/performance_test
Add support for services for Connext API
Acceptance Criteria:
- Make use of ros_middleware_interface service API
Connext is very slow to shutdown
Bug report
Required Info:
- Operating System:
- Ubuntu 16.04 AMD64
- Installation type:
- Source
- Version or commit hash:
- Release versions of crystal, from ros2/ros2#624
- DDS implementation:
- RTI Connext
- Client library (if applicable):
- rclpy-ish
Steps to reproduce issue
RMW_IMPLEMENTATION=rmw_connext_cpp ros2 run demo_nodes_cpp talker
(hit Ctrl-C here)
Expected behavior
Talker starts up, publishes some data, then quickly goes away when the user hits Ctrl-C.
Actual behavior
Talker starts up, publishes some data, then takes at least 3 seconds (sometimes longer) to go away after the user hits Ctrl-C.
Additional information
This seems to get worse with the number of nodes in the process. For the composition demos, for instance, it almost seems like it takes 3-5 seconds for each node loaded into the process.
Appropriate placement of generated idls for rmw_connext
Currently the request/response rtiddsgen generated idls are being installed in the "<pkg_name>/msg/dds_connext" folder. They should be appropriately placed in the "<pkg_name>/srv/dds_connext" folder.
segfault in all connext request-response tests on Windows
All of the connext based request-response tests and examples crash in the std::string
implementation with this error:
Unhandled exception at 0x00007FFF11C61332 (vcruntime140d.dll) in test_server__rmw_connext_cpp.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.
The back trace (of the connext version of test_server
):
> userland_msgs__rosidl_typesupport_connext_cpp.dll!std::char_traits<char>::copy(char * _First1, const char * _First2, unsigned __int64 _Count) Line 529 C++
userland_msgs__rosidl_typesupport_connext_cpp.dll!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & _Right, unsigned __int64 _Roff, unsigned __int64 _Count) Line 1132 C++
userland_msgs__rosidl_typesupport_connext_cpp.dll!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & _Right) Line 1115 C++
userland_msgs__rosidl_typesupport_connext_cpp.dll!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::operator=(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & _Right) Line 1003 C++
userland_msgs__rosidl_typesupport_connext_cpp.dll!connext::ReplierParams<userland_msgs::dds_::AddTwoIntsRequest_,userland_msgs::dds_::AddTwoIntsResponse_>::service_name(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & service_name) Line 72 C++
userland_msgs__rosidl_typesupport_connext_cpp.dll!userland_msgs::service_type_support::create_replier__AddTwoInts(void * untyped_participant, const char * service_name, void * * untyped_reader) Line 74 C++
rmw_connext_cpp.dll!rmw_create_service(const rmw_node_t * node, const rosidl_service_type_support_t * type_support, const char * service_name) Line 640 C++
test_server__rmw_connext_cpp.exe!rclcpp::node::Node::create_service<userland_msgs::AddTwoInts>(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & service_name, std::function<void __cdecl(std::shared_ptr<rmw_request_id_t> const &,std::shared_ptr<userland_msgs::AddTwoIntsRequest_<std::allocator<void> > > const &,std::shared_ptr<userland_msgs::AddTwoIntsResponse_<std::allocator<void> > > &)> callback_with_header, std::shared_ptr<rclcpp::callback_group::CallbackGroup> group) Line 219 C++
test_server__rmw_connext_cpp.exe!main(int argc, char * * argv) Line 36 C++
It's my belief that this is related to ros2/ros2#31, because when researching that issue I remember a discussion online about how the libc++ developers intentionally caused a link time error even though they could have made it work in order to avoid a run time error which would occur when creating a std::string
in one library and passing it to a library built with a different version. Here is the summary:
In order to turn this run time crash into a link time error, libc++ uses a C++11 language feature called inline namespace to change the ABI of std::string without impacting the API of std::string. That is, to you std::string looks the same. But to the linker, std::string is being mangled as if it is in namespace std::__1. Thus the linker knows that std::basic_string and std::__1::basic_string are two different data structures (the former coming from gcc's libstdc++ and the latter coming from libc++).
And I think that the VS2013 - VS2015 Preview headers do not use this trick, but potentially do change the implementation of basic_string
, which would cause an issue. This might be expected since they do not encourage you to use binaries built with one VC with binaries built from another.
So, it looks to me that we have no bug in our code, but simply that we need newer or from source versions of the RTI libraries to make this work.
Nondeterministic startup behavior
Several issues have been ticketed about a race condition between Connext and the user thread: Connext DataReaders and DataWriters are slow to establish a connection (probably due to multicast discovery). rclcpp spin_*
functions appear not to work if called before these entities are accessed before initialization:
ros2/ros2#111
ros2/rclcpp#124
ros2/system_tests#68 (pull request)
#76
There should be an option in create_subscription/publisher/service/client
to block until the underlying DataReader/Writer is finished initializing.
spin_node_once() can deadlock
It looks like spin_node_once()
can get lost and never return to the caller.
Reproduction:
cd build/test_rclcpp
while ./gtest_intra_process__rmw_connext_cpp; do true; done
Update: I'm not sure whether it matters, but I was also running stress -c 8
in parallel (on my 8-core Linux box).
Eventually the test hangs with this output:
Running main() from gtest_main.cc
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from test_intra_process_within_one_node
[ RUN ] test_intra_process_within_one_node.nominal_usage
spin_node_once(nonblocking) - no callback expected
spin_node_some() - no callback expected
spin_node_once() - callback (1) expected - try 1/2
Stacktrace from attaching gdb to the deadlocked process:
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00007fd97eb1b09b in RTIOsapiSemaphore_take () from /usr/lib/libnddscore.so.5
#2 0x00007fd97ed2b3c7 in PRESWaitSet_wait () from /usr/lib/libnddscore.so.5
#3 0x00007fd97f449e6d in DDS_WaitSet_waitI () from /usr/lib/libnddsc.so.5
#4 0x00007fd97e845ba3 in DDSWaitSet_impl::wait(DDSConditionSeq&, DDS_Duration_t const&) () from /usr/lib/libnddscpp.so.5
#5 0x00007fd98028bcd8 in wait<ConnextStaticSubscriberInfo, ConnextStaticServiceInfo, ConnextStaticClientInfo> (
subscriptions=0x7ffe441bfb80, guard_conditions=0x7ffe441bfbb0, services=0x7ffe441bfb90, clients=0x7ffe441bfba0,
wait_timeout=0x0) at /home/gerkey/ros2_ws/install/include/rmw_connext_shared_cpp/shared_functions.hpp:259
#6 0x00007fd980288465 in rmw_wait (subscriptions=0x7ffe441bfb80, guard_conditions=0x7ffe441bfbb0, services=0x7ffe441bfb90,
clients=0x7ffe441bfba0, wait_timeout=0x0) at /home/gerkey/ros2_ws/src/ros2/rmw_connext/rmw_connext_cpp/src/functions.cpp:787
#7 0x00007fd9808fbb90 in rclcpp::executor::Executor::wait_for_work (this=0x7ffe441bffa0, timeout=...)
at /home/gerkey/ros2_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:370
#8 0x00007fd9808fcd00 in rclcpp::executor::Executor::get_next_executable (this=0x7ffe441bffa0, timeout=...)
at /home/gerkey/ros2_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:523
#9 0x00007fd9808fab1c in rclcpp::executor::Executor::spin_once (this=0x7ffe441bffa0, timeout=...)
at /home/gerkey/ros2_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:129
#10 0x00007fd9808fa61f in rclcpp::executor::Executor::spin_node_once_nanoseconds (this=0x7ffe441bffa0, node=..., timeout=...)
at /home/gerkey/ros2_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:97
#11 0x0000000000485649 in rclcpp::executor::Executor::spin_node_once<std::ratio<1l, 1000l> > (this=0x7ffe441bffa0, node=...,
timeout=...) at /home/gerkey/ros2_ws/install/include/rclcpp/executor.hpp:115
#12 0x0000000000480e3e in test_intra_process_within_one_node_nominal_usage_Test::TestBody (this=0xc35940)
at /home/gerkey/ros2_ws/src/ros2/system_tests/test_rclcpp/test/test_intra_process.cpp:77
#13 0x00000000004b0daa in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0xc35940,
method=&virtual testing::Test::TestBody(), location=0x4bdb3b "the test body") at /usr/src/gtest/src/gtest.cc:2090
#14 0x00000000004ac40a in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0xc35940,
method=&virtual testing::Test::TestBody(), location=0x4bdb3b "the test body") at /usr/src/gtest/src/gtest.cc:2126
#15 0x0000000000499f2d in testing::Test::Run (this=0xc35940) at /usr/src/gtest/src/gtest.cc:2162
#16 0x000000000049a632 in testing::TestInfo::Run (this=0xc35180) at /usr/src/gtest/src/gtest.cc:2338
#17 0x000000000049ab8e in testing::TestCase::Run (this=0xc35610) at /usr/src/gtest/src/gtest.cc:2445
#18 0x000000000049f634 in testing::internal::UnitTestImpl::RunAllTests (this=0xc352b0) at /usr/src/gtest/src/gtest.cc:4243
#19 0x00000000004b1ceb in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
object=0xc352b0,
method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x49f3c6 <testing::internal::UnitTestImpl::RunAllTests()>, location=0x4be678 "auxiliary test code (environments or event listeners)")
at /usr/src/gtest/src/gtest.cc:2090
#20 0x00000000004ad2b2 in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
object=0xc352b0,
method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x49f3c6 <testing::internal::UnitTestImpl::RunAllTests()>, location=0x4be678 "auxiliary test code (environments or event listeners)")
at /usr/src/gtest/src/gtest.cc:2126
#21 0x000000000049e59f in testing::UnitTest::Run (this=0x6e3c80 <testing::UnitTest::GetInstance()::instance>)
at /usr/src/gtest/src/gtest.cc:3880
#22 0x00000000004bb543 in main (argc=1, argv=0x7ffe441c0438) at /usr/src/gtest/src/gtest_main.cc:38
[rmw_connext_dynamic] fix cleanup functions
In #45 various clean up functions have been introduced. In various tests (e.g. in test_rclcpp
) these functions print error messages:
Error in destruction of rmw publisher handle: failed to delete contained entities for publisher, at ...
Error in destruction of rmw subscription handle: failed to delete subscriber, at ...
Several function calls are not checked for returns / Sequence lengths are not checked
Bug report
-
Several function calls are not checked for returns: see here https://github.com/ros2/rmw_connext/search?utf8=%E2%9C%93&q=return_loan&type=
-
Sequence lengths are not checked before accessed: https://github.com/ros2/rmw_connext/search?utf8=%E2%9C%93&q=DDS_SampleInfo+%26+sample_info+%3D+sample_infos%5B0%5D%3B&type=
I am running out of time to fix now but can do next month if noone else comes to it.
Accumulate DDS errors during cleanup and return error code from wait() function
Bug report
In #263 we provided a quick fix for when DDS returns errors which does not guarantee a complete clean up. To quote @wjwwood
You'd need to accumulate the errors during clean up and then return an error return code if there are any accumulated (and also find a way to communicate all of the issues encountered or else decide to just log errors that are being "overwritten" by new ones).
To recap, we are talking about the code in this function. The call graph for this function is as follows:
rmw_ret_t wait()
is used in:
- rmw_connext/rmw_connext_dynamic_cpp/src/functions.cpp =>
rmw_ret_t rmw_wait()
- rmw_connext/rmw_connext_cpp/src/rmw_wait.cpp =>
rmw_ret_t rmw_wait()
rmw_ret_t rmw_wait()
is used in:
- rcl/rcl/src/rcl/wait.c =>
rcl_ret_t rcl_wait()
rcl_ret_t rcl_wait()
is used in:
- rclcpp/rclcpp/src/rclcpp/executor.cpp =>
wait_for_work()
wait_for_work()
is used in:
AnyExecutable::SharedPtr Executor::get_next_executable(std::chrono::nanoseconds timeout)
get_next_executable()
is used in:
spin_once()
spin_some()
spin_once()
or spin_some()
is used directly in the application
I would first need to understand what is meant by the complete clean up. Can someone maybe fill me in a bit?
Would we need to set all
rmw_subscriptions_t * subscriptions,
rmw_guard_conditions_t * guard_conditions,
rmw_services_t * services,
rmw_clients_t * clients,
rmw_waitset_t * waitset,
to null and then also catch and handle this in rcl/rcl/src/rcl/wait.c => rcl_ret_t rcl_wait()
?
Required Info:
- Operating System:
- Ubuntu 16.04
- Installation type:
- source
- Version or commit hash:
- see #263
- DDS implementation:
- RTI Connext
- Client library (if applicable):
NA
[rmw_connext_cpp] Cannot create secure nodes with current usage of partitions
Found that out when investigating ros2/sros2#32 (comment)
As all our nodes now start with parameters enabled. They create a set of services topics by default:
get_parametersReply
get_parametersRequest
get_parameter_typesReply
get_parameter_typesRequest
list_parametersReply
list_parametersRequest
describe_parametersReply
describe_parametersRequest
set_parametersReply
set_parametersRequest
These topics are using the partitions using the previx defined in the design doc and the node name (rq/<NODE_NAME>
for requests and rr/<NODE_NAME>
for replies)
If we define the access policies to match this, e.g.
<partitions>
<partition>rq/talker</partition>
</partitions>
<topics>
<topic>get_parametersRequest</topic>
</topics>
The node creation fails:
RTI_Security_AccessControl_check_create_datawriter:endpoint not allowed: no rule found; default DENY
DDS_DomainParticipantTrustPlugins_getLocalDataWriterSecurityState:!security function check_create_datawriter
DDS_DataWriter_create_presentation_writerI:ERROR: Failed to get local datawriter security state
DDS_DataWriter_createI:!create PRESPsWriter
DDS_Publisher_create_datawriter_disabledI:!create DataWriter
DDSDataWriter_impl::createI:!create writer
initialize:!create DataWriter
connext::details::EntityUntypedImpl::initialize:!failed (see previous errors)
>>> [rcutils|error_handling.c:155] rcutils_set_error_state()
This error state is being overwritten:
'C++ exception during construction of Requester, at /home/mikael/work/ros2/current_ws/build_debug_isolated/rcl_interfaces/rosidl_typesupport_connext_cpp/rcl_interfaces/srv/dds_connext/get_parameters__type_support.cpp:98'
with this new error message:
'failed to create requester, at /home/mikael/work/ros2/current_ws/src/ros2/rmw_connext/rmw_connext_cpp/src/rmw_client.cpp:139'
rcutils_reset_error() should be called after error handling to avoid this.
<<<
terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
what(): could not create client: failed to create requester, at /home/mikael/work/ros2/current_ws/src/ros2/rmw_connext/rmw_connext_cpp/src/rmw_client.cpp:139, at /home/mikael/work/ros2/current_ws/src/ros2/rcl/rcl/src/rcl/client.c:174
This is due to that fact that the partition is set after the requester is created. So the requester has an empty partition when the access rules are being checked.
requester creation here:
rmw_connext/rmw_connext_cpp/src/rmw_client.cpp
Lines 131 to 135 in fd85145
partition set here:
This should be fixed as soon as we get rid of the use of partitions for topic namespacing, at that point we should remove the whitelist for empty partitions here:
https://github.com/ros2/sros2/blob/69ee5b691604cebc8af822db359bba7c67a9df7d/sros2/api/__init__.py#L348-L349
Client and service do not exchange messages
See ros2/rclcpp#95
add_two_ints_server__rmw_connext_cpp
/add_two_ints_client__rmw_connext_cpp
do not exchange any messages
detect participant-local graph changes
For reference:
- ros2/ros2#215 (comment)
- https://community.rti.com/forum-topic/built-publication-data-local-datawriters-and-datareaders
Basically, any newly created DataWriters and DataReaders generate an entry on a "builtin" DDS topic. In OpenSplice you get all notifications, but Connext follows a section of the DDS spec that says locally created (created in the same participant) DataWriters and DataReaders don't generate entries (the specifics are in the above links).
So for us to get notifications of local changes, we'll need to maintain some state ourselves.
Because of this I disabled the rcl_service_server_is_available
function for Connext and Connext Dynamic:
rmw_connext/rmw_connext_dynamic_cpp/src/functions.cpp
Lines 3327 to 3329 in c983a8c
rmw_connext/rmw_connext_dynamic_cpp/src/functions.cpp
Lines 3327 to 3329 in c983a8c
I also disabled the related tests in rcl
(which should be re-enabled after they're fixed):
use a heuristic to determine whether or not to use asynchronous publishing
See: #183 (comment)
The proposal from the linked pull request would be, use synchronous publishing:
- if the reliability is
BEST_EFFORT
(type bounded or unbounded) - if the reliability is
RELIABLE
and the type is bounded and the maximum size is less thanMAX_SYNC_PAYLOAD
Where MAX_SYNC_PAYLOAD
is some maximum size that can be used without asynchronous publishing.
Use asynchronous publishing:
- if the reliability is
RELIABLE
and the type is bounded and the maximum size is more thanMAX_SYNC_PAYLOAD
- if the reliability is
RELIABLE
and the type is unbounded (has no maximum size)
Something else to consider is whether or not messages with unbounded size can always be published with synchronous publishing, even with reliability as BEST_EFFORT
.
RTI shared memory issue
I am currently testing the namespace implementation on connext.
On the build farm i get rti related error messages, which I cannot reproduce locally:
18:10:14 4: [test_executable_0] [D0108|ENABLE]RTIOsapiSharedMemoryMutex_create:OS semget() failure, error 0X1C: No space left on device
18:10:14 4: [test_executable_0] [D0108|ENABLE]NDDS_Transport_Shmem_create_recvresource_rrEA:failed to initialize shared memory resource mutex for key 0xb086aa
The RTI knowledge base recommends increasing the number of allowed semaphores and such, however I am a bit sceptical about it. Has one of you ever encountered similar behavior?
Import of symbols is not working fine on 5.2 Community and VS2015
We need to use some flags to get proper visibility from headers in the 5.2 Community version of RTI connext. The following patch implement the ones needed in rmw_connext
and rmw_connext_dynamic
.
diff --git a/connext_cmake_module/cmake/Modules/FindConnext.cmake b/connext_cmak
index 087a563..aac2436 100644
--- a/connext_cmake_module/cmake/Modules/FindConnext.cmake
+++ b/connext_cmake_module/cmake/Modules/FindConnext.cmake
@@ -183,7 +183,12 @@ if(NOT "${_NDDSHOME} " STREQUAL " ")
set(Connext_LIBRARY_DIR "${Connext_LIBRARY_DIRS}")
if(WIN32)
- set(Connext_DEFINITIONS "RTI_WIN32" "NDDS_DLL_VARIABLE")
+ set(Connext_DEFINITIONS "RTI_WIN32"
+ "NDDS_DLL_VARIABLE"
+ "RTI_dds_c_DLL_VARIABLE"
+ "RTI_dds_cpp_DLL_VARIABLE"
+ "RTI_log_DLL_VARIABLE")
+
# This will be a .bat file and it will be on the PATH.
set(Connext_DDSGEN2 "rtiddsgen.bat")
else()
hing CI
Which version of Connext DDS is used/supported in ROS 2
Due to problems with Opensplice I'm evaluating other DDS implementations.
I took a closer look at https://www.rti.com/products/ .
There are several versions of Connext DDS like Professional, Secure, Micro, Cert.
Which of them is supported with ros 2 ?
As far as i can see rti connext is available for raspberry pis. ( https://community.rti.com/content/forum-topic/howto-run-rti-connext-dds-raspberry-pi )
Do you know of someone who already tried to run ROS 2 with rit connext on a raspberry pi?
support reliable large data publishing with asynchronous publisher
When trying to publish large messages reliably with Connext (like images with programs in image_tools
) we get a message like this:
COMMENDSrWriterService_write:!write. Reliable large data requires asynchronous writer.
See: https://community.rti.com/examples/asynchronous-publisher
Fixed guard conditions are not reset
@dirk-thomas and I identified an apparent bug: while the non-fixed guard conditions are reset to false after waiting, the fixed guard conditions that were added during waitset creation are never reset to false.
As a result, if you've ever done something to trigger a fixed guard condition (e.g., add a node or subscriber), then rmw_wait()
on Connext should return immediately, every time you call it, without waiting. That behavior might explain many of our test failures.
The proposed fix is to add a block to the end of the wait call to reset the fixed guard conditions.
Pub/sub fails across different nodes in same process
Branch multiple_nodes in system_tests, package test_rclcpp, illustrates this bug:
https://github.com/ros2/system_tests/tree/multiple_nodes
Case 1 fails for Connext and passes for Opensplice:
node1 and node2 are both added to different executors.
node1 publishes "foo", node2 subscribes to "foo"
node2 publishes "bar", node2 subscribes to "bar"
Both publishers publish 5 times.
0/5 messages are received for both subscribers.
Case 2 fails for Connext and passes for Opensplice:
node1 and node2 are both added to the same executor.
node1 publishes "foo", node2 subscribes to "foo"
node2 publishes "bar", node2 subscribes to "bar"
Both publishers publish 5 times.
0/5 messages are received for both subscribers.
Case 2 passes:
one node publishes "foo", "bar", subscribes to "foo" and "bar"
Both publishers publish 5 times.
5/5 messages are received for both subscribers.
Case 3 passes:
node1 and node2 both added to one executor.
node1 publishes "foo", subscribes to "foo"
node2 publishes "bar", subscribes to "bar"
Both publishers publish 5 times.
5/5 messages are received for both subscribers.
Code in wait function continues to possibly return RMW_RET_OK despite DDS return error
Bug report
Required Info:
-
Operating System:
- Ubuntu 16.04
-
Installation type:
( )
( )
( )
(
)- Version or commit hash:
- see above
- DDS implementation:
- RTI Connext
- Client library (if applicable):
- NA
Steps to reproduce issue
Quoting @dirk-thomas
After setting the error message the function should return `RMW_RET_ERROR`. The quick fix would be to just add the return statement in the cases it is missing.
The "correct" fix would be to update the structure of the code to store the return value in a value but still perform the other cleanup and only at the very end `return`.
I will provide PR for the quick fix and started an issue for the correct
fix.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.