The osgar from robotika

SubT Virtual: optimize multiple explorations

The robot can explore the world multiple times, for example code A10F200F450L means wait 10s, explore for 200s, return to the base and then explore again 450s along the left wall. In the second exploration it will "waste" repeating long way even in to dead-ends. It would be smarter to re-use turn path from the first exploration. Note, that the code is already there (see follow_trace()).

Logging of large data blocks

Starting with camera #82 there is a need for larger data blocks than 0xFFFF bytes. I would propose to implement serialization this way:

all data smaller than 0xFFFF would be as for now
large blocks would be stored in 0xFFFF pieces, with identical timestamp and channel
block of size multiple 0xFFFF would be followed by a block of zero length.

If you would agree I would implement it ASAP (due close to SICK Robot Day dead-line).
thanks m.

Use automatic code formater

One of the things I really like about golang is its gofmt tool. The point of the tool is that you don't have to care about the code style (where to put white space, the order of imports etc.). The formatter does it for you. It is included on golang plugin for vim, where it reformats the code on each save.

https://black.readthedocs.io/ is such a formatter for python. It makes sure no changes are introduced:

Also, as a temporary safety measure, Black will check that the reformatted code still produces a valid AST that is equivalent to the original. This slows it down. If you’re feeling confident, use --fast.

I'd suggest we start using it as part of our work flow. It can be after save in the editor/ide, as before-commit hook in git or something else we can come up with. With this, there would be one less thing to worry about.

Recognize artifacts using a neural network

There is an older attempt to bring DNN artifact detection started at https://github.com/robotika/subt-artf/tree/master/model. This issue is meant to track information related to that effort.

SubT: Optimize return_home()

Add search structucture for Trace for faster response to query next nearest point.

Remove "slots"

We have added the notion of "slots" in the middle of hacking while preparing for STIX in Denver. It helped at the moment with the problem we had, but it is a hack that complicates maintaining the code. We should find out what the real problem was, fix it and get rid of "slots".

Should bus.publish() raise BusShutdownException?

I was suprised that following code was never ending:

    def run(self):
        try:
            while True:
                self.bus.publish('tick', None)
                self.bus.sleep(self.sleep_time)
        except BusShutdownException:
            pass

Then I realized that there should be self.bus.is_alive() instead but ... do you remember why? Now I would propose to have both ... as far as I remember the is_alive() is for "boundary" nodes which have only external inputs i.e. they do not use listen() and the external inputs could be very rare ... ???

Upgrade to Python3.8

The sooner we upgrade, the longer we have to fix possible regressions. I believe Python3.8 is available on all platforms we currently use (sudo apt install python3.8 for ubuntu-18.04) and it is going to be the default python3 on ubuntu-20.04.

The blocker here is the availability of a working wheel IntelRealSense/librealsense#6126.

Proper threads join on termination

At the moment some threads are killed before proper termination - join() with timeout is necessary. Related to #34

Logging of bus.shutdown()

We discussed the need to log termination of the program run, which was initiated by bus.shutdown (typically from Node.request_stop()). The question is if every stream in the log file should have "terminal" mark or we should use service channel 0 instead? It is triggered for all modules in the same time and we have log file ordered by timestamp ... so one common mark should be enough. On the other hand this means that whenever we replay individual node we have to take into account also channel 0 (and add there some structure to distinguish error reports, stored config etc). Any suggestion?
This issue is related to #110.

Danger with bus.publish() of a list

If you call self.bus.publish() with a list then you have to be aware that only reference is sent to listeners. So if you modify the content after publishing the recipients may get different data then what is stored in the log file.

This issue showed up during development of feature/velodyne with example of buffer Accumulator:

class Processor(Node):
    def __init__(self, config, bus):
        super().__init__(config, bus)

    def update(self):
        channel = super().update()
        assert channel == 'map', channel
        print(self.time, len(self.map))
        self.sleep(0.2)  # simulate some work here
        self.publish('request', True)

    def run(self):
        try:
            self.publish('request', True)
            while True:
                self.update()
        except BusShutdownException:
            pass


class Accumulator(Node):
    def __init__(self, config, bus):
        super().__init__(config, bus)
        self.map_data = []

    def update(self):
        channel = super().update()
        if channel == 'request':
            self.publish('map', self.map_data)
        elif channel == 'xyz':
            self.map_data.extend(self.xyz)
            self.map_data = self.map_data[-100:]  # keep last 100 points

So while in real run we got this output:

python -m osgar.record --duration 2 accumulator.json
0:00:00.015624 0
0:00:00.234369 100
0:00:00.437484 100
0:00:00.640606 100
0:00:00.843733 100
0:00:01.046844 100
0:00:01.249961 100
0:00:01.453080 100
0:00:01.656198 292
0:00:01.859317 100

and if you replay the data you will get:

python -m osgar.replay --module processor logs\accumulator-190614_152830.log
0:00:00.015624 0
0:00:00.234369 100
0:00:00.437484 100
0:00:00.640606 100
0:00:00.843733 100
0:00:01.046844 100
0:00:01.249961 100
0:00:01.453080 100
0:00:01.656198 100
0:00:01.859317 100

Note the 0:00:01.656198 292 vs. 0:00:01.656198 100. So the sender has to make copy of the data and the receiver has to use received data as read-only (otherwise they may change them also for some other receivers!].

Use code coverage

I don't have a specific proposal here yet. However, to help with refactoring, it would be great to know we have covered our bases with tests.

lidarview improvements

In lidarview it would be nice to:

be able to add more than one field into the title
be able to see all bounding boxes within each frame (it currently only shows one of them)

Safety - system health

Make sure that the tractor is not moving when

steering CAN module is in pre-operation mode
the latest laser scan is older than 0.4s
pedal motor is not responding (for that we need extra kill switch)

Support multi-file logfiles

Yesterday we already reached 1 hour recording (1.3GB) for SubT ROS simulation in Virtual Track:

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/martind/md/osgar/examples/subt/mytimer.py", line 20, in run
    self.bus.publish('tick', None)
  File "/home/martind/md/osgar/osgar/bus.py", line 34, in publish
    timestamp = self.logger.write(stream_id, serialize(data))
  File "/home/martind/md/osgar/osgar/logger.py", line 69, in write
    assert dt.seconds < 3600, dt  # overflow not supported yet
AssertionError: 1:00:00.091434

So it is time to support indexed files, right? The default split criteria should be time=>3600s, but it can be also shorter time or size. I would use ADTF scheme for DAT files (and probably many others) with _000, _001 etc names (alternative is to use dots, i.e. .000). I would still keep the .log extension and each part should be "independent" in the sense, that if you want recording from the 2nd hour you should not need to load the first (not indexed) file, OK?

On the other hand, if you open the first file with default parameters it should go through all files and with some extra parameter it would read a single file only.

Do we want to mark in the first file, that the recording is not complete?

It is surely necessary to copy all named streams from the zero stream, but should we also add command line and config? (replay functions would probably fail anyway on the other parts)

subt-go1m-181203_195930.log
subt-go1m-181203_195930_000.log
subt-go1m-181203_195930_001.log
subt-go1m-181203_195930_002.log

Note, that 3600s limit is due to our timestamp representation, but it is never-the-less reasonable to split several GB files into smaller pieces.

Replace naked 'print' with python logging module

Something like a879734

SubT: handle LoRa commands in follow_trace() and return_home()

This turned to be fatal bug and why we lost two robots (MOBoS and Maria) in the first SubT Urban run.

At the moment LoRa commands Pause and Continue and GoHome are handled only in SubTChallenge.follow_wall, which used to be sufficient until now when we are adding new "features". Pause is useful for coordination of multiple robots in congested areas as narrow passages like entrance gate. During Pause robot should not move. It's clock should be also stopped so timeouts will not trigger termination of some subroutines (to make it more complex some timeouts should take into account also paused time like the overall exploration limit).

What could not be obvious is that for example also LIDAR scans should not be processed as follow_trace would terminate due to passing other robot in front of the sensor (it should terminate if there is a wall too close). So it seems that Pause is rather some kind of "freeze" or "hibernation" where the robot does not do anything including perception of the world.

SubT Exchange of artifact positions

All robots should regularly broadcast positions of currently found artifacts (and also positions of found artifacts of other robots). The teambase could filter them, but if duplicates are counted only once also direct report is OK.

SubT 3D follow path

As an escape plan from deep holes and chance to return home for drones.

stereo camera sync

When working with a stereo camera, we need to make sure that the two frames we are trying to do calculations on (e.g., disparity map) are in sync, ie taken in the same moment. It doesn't appear that OSGAR publishes anything from the incoming image streams besides the images themselves which makes keeping them in sync on the application level difficult. Recommend extraction and publication of additional fields such as seq and timestamps in support of this.

Call of self.buf.shutdown() kills only caller

i.e. it does not distribute "shutdown request" to other threads/modules ...

SubT Virtual: variable partial explorations

It would be useful to handle combinations of multiple types of explorations, i.e. A0F200LF400R would mean explore 200s along the left wall and after return to the base station explore 400s along the right wall. At the moment only one global side/exploration algorithm is used per robot.

The complexity of this task should be like 2 IF statements??

Reporting/logging module errors

The modules can fail to do their job, and it would be nice to see it afterwards. If the error is reported only on stderr you may overlook it. We can use our "system stream 0" to dump the problem there (actually I already once had to do it when "hacking" I2C failure on boat (273d9be) but I would prefer some nicer unified way ... any suggestion?

bus.report_error(e)?

                   except OSError as e:
                        print(e)
                        with self.bus.logger.lock:
                            self.bus.logger.write(0, bytes(str(e), encoding='ascii'))

would be now

                   except OSError as e:
                        self.bus.report_error(e)

?

update readme (robot.py -> record.py)

We have forgot to update the readme to reflect the renaming of robot.py to record.py.

While at it we should also change the syntax from python ./osgar/record.py to python -m osgar.record.

Replace config files with python

I have been working with the config/*.json files for some time and there are certain paper cuts that I'd like to fix in the future.

It is difficult to support variant hardware. We are solving it with multiple *.json files but that does not really scale. Sometimes I feel I'd like to share "subgraphs" of nodes between config files. Other times I'd like to set some init values based on command like arguments.

Overall I think we should just skip the idea of config files as such and replace it with regular python.

Node should not inherit from Thread

osgar/osgar/node.py

Line 9 in 161610e

class Node(Thread):

We have agreed on this but decided to postpone it to a later time.

Test sometimes fail

python -m unittest -v osgar.drivers.test_logsocket.LogSocketTest.test_tcp_server

fails most often but others might be affected as well. When this test does not fail, it means that it has actually not been run. There is a race condition in the code:

osgar/osgar/drivers/test_logsocket.py

Lines 42 to 57 in 3563954

    
           def test_tcp_server(self): 
        
               with patch('osgar.drivers.logsocket.socket.socket') as mock: 
        
                   instance = mock.return_value 
        
                   instance.accept = MagicMock(return_value=('127.0.0.1', 1234)) 
        
                   instance.recv = MagicMock(return_value=b'some bin data') 
        
                   logger = MagicMock() 
        
                   bus = BusHandler(logger) 
        
                   config = {'host': '192.168.1.2', 'port':8080} 
        
                   device = LogTCPServer(config=config, bus=bus) 
        
                   device.start() 
        
                   device.request_stop() 
        
                   device.join() 
        
                   instance.listen.assert_called_once_with(1) 
        
                   instance.bind.assert_called_once_with(('192.168.1.2', 8080))

where the request_stop is done before start (sometimes). We were able to find this problem due to a bug in this test:

osgar/osgar/drivers/test_logsocket.py

Line 45 in 3563954

instance.accept = MagicMock(return_value=('127.0.0.1', 1234))

where accept should return instance of socket and not string.

Other tests following this pattern can be affected as well. Just the problem does not show because there is no bug in them.

Change timestamp implementation to round trip safely to/from log

Using time.time() serialized as a string is probably not a good idea since the round trip is not a lossless operation.

Modules should do no work in init

Having connect call in __init__ fails the main thread responsible for starting everything. Currently used here:

osgar/osgar/drivers/logsocket.py

Lines 79 to 89 in d74f2ed

    
           class LogTCPStaticIP(LogTCPBase): 
        
               """ 
        
                 TCP driver for existing static IP (i.e. SICK LIDAR) 
        
               """ 
        
               def __init__(self, config, bus): 
        
                   super().__init__(config, bus) 
        
                   try: 
        
                       self.socket.connect(self.pair) 
        
                   except socket.timeout as e: 
        
                       print('Timeout', self.pair) 
        
                       raise

Actual hardware should be touched only later during the startup process in the module thread. In the future it will also help us in case we want to start in multiple processes.

Specify required packages

In order to get repeatable deployment and automatic tests, we need to specify which python packages are required for osgar.

Prune branches in this repository

There are currently over 90 branches in this repo. Some of them have not been touched for over 3 years. It is about time to prune them.

LogIndexedReader crash on growing files

  File "/home/zbynek/osgar-ws/indexedlidar/osgar/logger.py", line 203, in _create_index
    micros, channel, size = struct.unpack('IHH', data[pos:pos+8])
struct.error: unpack requires a buffer of 8 bytes

SubT System - LoRa STOP on return_home

LoRa at the moment works only during initial exploration. The same code was used for return, but with optimization and integration of LocalPlanner into System it is no longer true. The same holds for PAUSE and RESUME. Note that in these cases timeout modifications have to be taken into account.

Use different and proprietary log file extension

Currently, the logs are created with the extension .log. This is very generic, may already associated with another app on any given system or may be undesirable to associate this extension with an osgar-specific utility. I would recommend the default extension to be switched to something unique, something like .olog perhaps.

Integrate slope lidar

The slope lidar is used on robot Eduro to scan traversibility of the surface in front of the robot. Current mounting corresponds to 60 cm height and 145cm readings -> 25deg

>>> math.degrees(math.asin(60/145))
24.443335427697384

Note, that the slope lidar is relatively sparse and has to be integrated into some local map otherwise obstacles like wooden barrier would be overlooked (it is visible at some distance but if you move closer it "disappears").

Use pylint

Let's face it. The code base is getting really big. Static analysis should help us stay on top of things.

Running pylint in its default configuration on osgar/record.py gives me the following output:

************ Module osgar.record
osgar/record.py:83:2: W0511: TODO nicer reference (fixme)
osgar/record.py:43:0: C0301: Line too long (101/100) (line-too-long)
osgar/record.py:16:0: C0115: Missing class docstring (missing-class-docstring)
osgar/record.py:17:4: R0914: Too many local variables (19/15) (too-many-locals)
osgar/record.py:49:4: C0116: Missing function or method docstring (missing-function-docstring)
osgar/record.py:53:4: C0116: Missing function or method docstring (missing-function-docstring)
osgar/record.py:56:4: C0116: Missing function or method docstring (missing-function-docstring)
osgar/record.py:60:4: C0116: Missing function or method docstring (missing-function-docstring)
osgar/record.py:60:27: W0613: Unused argument 'sig' (unused-argument)
osgar/record.py:60:37: W0613: Unused argument 'frame' (unused-argument)
osgar/record.py:67:4: C0116: Missing function or method docstring (missing-function-docstring)
osgar/record.py:72:0: C0116: Missing function or method docstring (missing-function-docstring)
osgar/record.py:73:7: C0123: Using type() instead of isinstance() for a typecheck. (unidiomatic-typecheck)
osgar/record.py:91:0: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 8.11/10

What caught my eye here is R0914:

Used when a method or function uses more than 15 variables in the namespace.
Some programmers consider using several local variables in on function or method is an indicator that the function or method is too complex, or trying to do too much.

And in a different branch I am already refactoring Recorder.__init__ because I've found it to be hard to read and/or refactor.

Configuration file for navpat.py

The start pose and location of cones slightly varies (15x5m is the default, but 12x5m fit better for the presentation for example). Also experiments with start position complicate it (see #9). Moreover the set of patterns could be more complex(?). Later we may want robot to self-localize or to generate the map on the first run and use it for other runs etc.

The configuration should be text file, i.e. easily readable and accessible from other scripts/programs without need of complex serialization/deserialization. Options are JSON, XML, YAML (requires external library), or other? Any preference?

SubT TODO List

This is simple overview of tasks we should do in OSGAR in order to succeed in SubT Challenge (both virtual and system track competitions). This first overview section will be edited and linked to PR or other issues.

Common Tasks

fast navigation in the middle of the straight tunnel

System Track

camera driver for Robik #180, #181
apriltag detector: WIP https://github.com/zwn/apriltag.git

Virtual Track

test communication model for two robots

Allow multiple Nodes in one Thread?

As noted here #115 (comment) maybe we should relax the relation between Nodes and Threads from 1:1 to N:1 - thus allowing multiple Nodes to live in the same Thread.

Starting from scratch

Hi,
Starting from scratch is a bit complicated.

You have a tractor , right thing ...but now , what else ?

I suppose I have to setup some actuators , sensors, where can I find explanations ?

Thanks

SubT Virtual: detect return to base station for long runs

If the world is relatively small and the robot has code A60F2400L (explore for 40 minutes) then it can happen that robot returns sooner to the base station. The current code would continue and explore the world for the 2nd time.

It would be wiser to detect such case, report found artifacts and stop - it does not have sense to explore the world with the same strategy.

It should be possible to recognize this by distance from DARPA origin (0, 0, 0) - gate entrance, and verified via get_origin.

SubT Virtual: dynamic exploration algorithm

It would be nice to support variable exploration mode, i.e. for example code A0F200L40C400R, i.e. without F (separator, CloudSim does not support underscores) would mean explore 200s along left wall, then 40s navigate to empty space in center and then add 400s along the right wall.

Decide on a workflow

We should have a file in the repository describing how development is done in this repository. Something similar to git flow, github flow, gitlab flow or bitbucket flow (or any other flows that exist).

We can flesh out the details in this issue and the summarize it to a file that github shows when opening an issue or a PR - https://github.com/blog/1184-contributing-guidelines

Provide tools for stacked strategy

Not necessarily "three-layered" architecture, but maybe yes. At the moment it would be maybe enough to extend monitors like EmergencyStop to others ("VirtualBumper", "LoRa Pause"). The programmer of "go straight" function should not worry about collisions and remote commands but he/she would have option to select what all has to be handled "on background". The robot can be remotely paused during "Go 1 meter", terminated or returned on traversed trajectory.

`application` argument should be string

Where we supply application as an argument to a function (like record, replay...) we should supply a string instead, that can be used with get_class_by_name. With this change we can replay logs using only generic tools and we can drop the replay capability from i.e. subt/main.py.

It will improve record as well, although not as much since we have custom command line arguments. But it is a step in the right direction anyway.

Debug failures on subt cloudsim

It seems our https://github.com/robotika/osgar/blob/master/subt/docker/robotika/ros_proxy_node.cc is not good enough wrt latency which results in message drops. Enabling ROS DEBUG output gets us some messages that seem to support it.

Zipped sparse data

There is a need to compress some external raw data like ROS sensor_msgs/PointCloud2 message (see https://robotika.cz/competitions/subtchallenge/cs#190105 for motivation). This should be probably on the very low level of serialize/deserialize. Any suggestion? Should we provide option line "raw|zip" or "raw.zip" on definition of node output, where "raw" would remain not compressed?
p.s. later I realized that I had to deal with the same problem a year ago when logging stereo data from Naio Oz robot (https://github.com/robotika/naio/blob/master/myr2017.py#L266).

Log recovery

It would be nice to be able to recover OSGAR log file from missing header ... say that the user has only 10MB of the end of the logfile. The log is dense and there are no "transaction points". Actually even hint how to do it manually would be nice. Note, that I have another logfile with header, but the order of channels does not have to be identical.

Upgrade to latest opencv (mainly to get better DNN suport)

We are currently pinning the version of opencv to be <4. Newer versions have dnn module implemented using OpenVino which said to be optimize for inference on Intel CPUs and transparently supports inference on the Intel Movidius 2 USB stick.

Some background links:

Going this way we might avoid having to maintain of yet another dependency (like tensorflow lite or something else).

Tests should create its own temporary directory

Decided during review of #126.

	def test_tcp_server(self):
	with patch('osgar.drivers.logsocket.socket.socket') as mock:
	instance = mock.return_value
	instance.accept = MagicMock(return_value=('127.0.0.1', 1234))
	instance.recv = MagicMock(return_value=b'some bin data')

	logger = MagicMock()
	bus = BusHandler(logger)
	config = {'host': '192.168.1.2', 'port':8080}
	device = LogTCPServer(config=config, bus=bus)
	device.start()
	device.request_stop()
	device.join()

	instance.listen.assert_called_once_with(1)
	instance.bind.assert_called_once_with(('192.168.1.2', 8080))

	class LogTCPStaticIP(LogTCPBase):
	"""
	TCP driver for existing static IP (i.e. SICK LIDAR)
	"""
	def __init__(self, config, bus):
	super().__init__(config, bus)
	try:
	self.socket.connect(self.pair)
	except socket.timeout as e:
	print('Timeout', self.pair)
	raise

robotika / osgar Goto Github PK

osgar's People

Contributors

Stargazers

Watchers

Forkers

osgar's Issues

Common Tasks

System Track

Virtual Track

Recommend Projects

Recommend Topics

Recommend Org