Coder Social home page Coder Social logo

concert_scheduling's People

Contributors

jack-oquin avatar stonier avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

concert_scheduling's Issues

Requester workflow problem

Seems to be a problem with a particular flow of events in the scheduler:

I have an example in which the following happens.

  • A requester immediately sends off a requests to the scheduler.
  • The SchedulerRequests callback _allocate_resources catches the request
  • Eventually this callback tries to construct a _RequesterStatus object here:
self.requesters[rqr_id] = _RequesterStatus(self, msg)
self.requesters[requester_id].send_feedback()

which doesn't exist yet because we actually haven't got through the construction of _RequesterStatus yet nor added it to the dictionary.

Not sure where you'd like to tackle a fix for this. If you need instructions for a reproducible example, let me know.

Not processing newly allocatable resources

If a new resource is discovered on concert_client_changes, it doesn't proceed to allocate that resource and finally grant a request that is waiting.

I'm digging around for more information now and will update back here.

scheduler_node : left vs lost resources

This is a carry over from utexas-bwi/rocon_scheduler_requests#10. I'll try to crystallise what was there and then raise the issue.

Summary

  • We have resources within a request going missing
  • The scheduler should not act on this, just pass notification back to the requester
  • The requester has the freedom to choose its own response
    • cancel and resubmit, or delay, or other...

Current Status

Resources are currently tracked with a status flag with AVAILABLE, ALLOCATED and MISSING values. If a resource disappears from the conductor's /concert/conductor/concert_clients publication, then it is flagged as MISSING and published in turn on /concert/scheduler/resource_pool.

Issues

  1. There are actually two kinds of missing - left and lost

The first is when the concert client has left the concert. If this happens and it does come back, it will come back under a different rocon uri (probably with a postfixed counter to the name, e.g. kobuki2 instead of kobuki). The second when it just loses its network connection to the concert and will come back at some point.

The conductors' concert_clients topic provides the necessary information for both cases.

When a client has left, it just disappears from the topic's list of clients and this is the case that is handled in the resource pool update.

When a client is lost, it will (have a bit of work to do this week on this) get shown in the client_status variable which indicates its current connectivity, e.g.

clients: 
  - 
    name: kobuki
    gateway_name: kobukib093a49d98e747cd9ef2f1cb337408f3
    platform_info: 
      uri: rocon:/pc/kobuki/hydro/precise
      version: acdc
      icon: 
        resource_name: ''
        format: png
    client_status: connected
    app_status: running

Questions

Queues and scheduler rset not in sync

The scheduler will grant a request, but in the request feedback back to the requester, the pre-grant status of the request is received (i.e. the grant update did not take effect).

I added some logging to get a handle on what is going on - the most interesting point of which is in the dispatch function. Passing it the rset variable via the callback method, i.e.

    def callback(self, rset):
        rospy.logdebug('scheduler callback:')
        for rq in rset.values():
            rospy.logwarn("DJS : request address in callback [%s]" % hex(id(rq)))
            rospy.logdebug('  ' + str(rq))
            if rq.msg.status == Request.NEW:
                self.queue(rq, rset.requester_id)
            elif rq.msg.status == Request.CANCELING:
                self.free(rq, rset.requester_id)
        self.dispatch(rset)                 # try to allocate ready requests

    def dispatch(self, rset=None):
        while len(self.ready_queue) > 0:
            # Try to allocate top element in the ready queue.
            elem = self.ready_queue.pop()
            rospy.logwarn("DJS: elem -> address [%s]" % hex(id(elem.request)))
            rospy.logwarn("DJS: elem -> id [%s]" % unique_id.toHexString(elem.request.msg.id))
            rospy.logwarn("DJS: elem -> status [%s]" % elem.request.msg.status)
            resources = []
            try:
                resources = self.pool.allocate(elem.request)
            except InvalidRequestError as ex:
                self.reject_request(elem, ex)
                continue                # skip to next queue element

            if not resources:           # top request cannot be satisfied?
                # Return it to head of queue.
                self.ready_queue.add(elem)
                break                   # stop looking

            try:
                elem.request.grant(resources)
                rospy.logwarn("DJS: dispatch -> address [%s]" % hex(id(elem.request)))
                rospy.logwarn("DJS: dispatch -> msg address [%s]" % hex(id(elem.request.msg)))
                rospy.logwarn("DJS: dispatch -> id [%s]" % unique_id.toHexString(elem.request.msg.id))
                rospy.logwarn("DJS: dispatch -> status [%s]" % elem.request.msg.status)
                if rset is not None:
                    for rq in rset.values():
                        rospy.logwarn("DJS: rq -> address [%s]" % hex(id(rq)))
                        rospy.logwarn("DJS: rq -> msg address [%s]" % hex(id(rq.msg)))
                        rospy.logwarn("DJS: rq -> id [%s]" % unique_id.toHexString(rq.msg.id))
                        rospy.logwarn("DJS: rq -> status [%s]" % rq.msg.status)
                rospy.loginfo(
                    'Request granted: ' + str(elem.request.uuid))
            except TransitionError:     # request no longer active?
                # Return allocated resources to the pool.
                self.pool.release_resources(resources)
            self.notification_set.add(elem.requester_id)

I get this output:

[WARN] [WallTime: 1396153452.730991] DJS : request address in callback [0x1f18490]
[WARN] [WallTime: 1396153452.731454] DJS: transitions [w] -> address [0x1f18490]
[WARN] [WallTime: 1396153452.731776] DJS: transitions [w] -> id [10d4e332-4e90-40f4-8f79-57c8bc098182]
[WARN] [WallTime: 1396153452.732043] DJS: transitions [w] -> grant [2]
[WARN] [WallTime: 1396153452.732398] DJS: transitions [w] -> resources [[rapp: rocon_apps/teleop
id: 
  uuid: [54, 104, 244, 114, 150, 196, 79, 31, 169, 128, 168, 115, 67, 248, 50, 146]
uri: rocon:/pc/guimul/hydro/precise
remappings: []]]
[INFO] [WallTime: 1396153452.733212] Request queued: 10d4e332-4e90-40f4-8f79-57c8bc098182
[WARN] [WallTime: 1396153452.733482] DJS: elem -> address [0x1f185d0]
[WARN] [WallTime: 1396153452.733764] DJS: elem -> id [10d4e332-4e90-40f4-8f79-57c8bc098182]
[WARN] [WallTime: 1396153452.734061] DJS: elem -> status [2]
[WARN] [WallTime: 1396153452.734742] DJS: transitions [g] -> address [0x1f185d0]
[WARN] [WallTime: 1396153452.735097] DJS: transitions [g] -> id [10d4e332-4e90-40f4-8f79-57c8bc098182]
[WARN] [WallTime: 1396153452.735369] DJS: transitions [g] -> grant [3]
[WARN] [WallTime: 1396153452.735695] DJS: transitions [g] -> resources [[rapp: rocon_apps/teleop
id: 
  uuid: [54, 104, 244, 114, 150, 196, 79, 31, 169, 128, 168, 115, 67, 248, 50, 146]
uri: rocon:/pc/guimul/hydro/precise
remappings: []]]
[WARN] [WallTime: 1396153452.736015] DJS: dispatch -> address [0x1f185d0]
[WARN] [WallTime: 1396153452.736313] DJS: dispatch -> msg address [0x1eedb40]
[WARN] [WallTime: 1396153452.736619] DJS: dispatch -> id [10d4e332-4e90-40f4-8f79-57c8bc098182]
[WARN] [WallTime: 1396153452.736906] DJS: dispatch -> status [3]
[WARN] [WallTime: 1396153452.737174] DJS: rq -> address [0x1f18490]
[WARN] [WallTime: 1396153452.737463] DJS: rq -> msg address [0x1eed830]
[WARN] [WallTime: 1396153452.737759] DJS: rq -> id [10d4e332-4e90-40f4-8f79-57c8bc098182]
[WARN] [WallTime: 1396153452.738022] DJS: rq -> status [2]
[INFO] [WallTime: 1396153452.738291] Request granted: 10d4e332-4e90-40f4-8f79-57c8bc098182
[WARN] [WallTime: 1396153452.738597] DJS: sending -> address [0x1f18490]
[WARN] [WallTime: 1396153452.738933] DJS: sending -> id 10d4e332-4e90-40f4-8f79-57c8bc098182
[WARN] [WallTime: 1396153452.739228] DJS: sending -> status 2
[WARN] [WallTime: 1396153452.739579] DJS: sending -> resources [rapp: rocon_apps/teleop
id: 
  uuid: [54, 104, 244, 114, 150, 196, 79, 31, 169, 128, 168, 115, 67, 248, 50, 146]
uri: rocon:/pc/guimul/hydro/precise
remappings: []]
[WARN] [WallTime: 1396153452.740037] DJS: resource pool changed
[WARN] [WallTime: 1396153452.740379] DJS: publishing known resources
[WARN] [WallTime: 1396153452.740843] DJS: requester feedback - request id 10d4e332-4e90-40f4-8f79-57c8bc098182
[WARN] [WallTime: 1396153452.741350] DJS: requester feedback - request status 2
[WARN] [WallTime: 1396153452.741736] DJS: requester feedback - request resources [rapp: rocon_apps/teleop

As you can see, the rset requets and the popped ready_queue request are not the same thing. So any grant() operation on the popped request is irrelevant. I suspect that they are supposed to be one and the same object?

deb release for indigo

@jack-oquin Any chance you could do a deb release for this? I could also do this for you along with our other rocon releases too if you like (lets you just worry about the sources).

concert_scheduler_requests crash

I'm afraid the error message is a bit too cryptic for me to explain. And I can't reproduce this error with certainty.

[ERROR] [WallTime: 1398033132.501574] bad callback: <bound method CompatibilityTreeScheduler._ros_subscriber_concert_client_changes of <concert_schedulers.compatibility_tree_scheduler.scheduler.CompatibilityTreeScheduler object at 0x17fd848>>
Traceback (most recent call last):
  File "/opt/ros/hydro/lib/python2.7/dist-packages/rospy/topics.py", line 682, in _invoke_callback
    cb(msg)
  File "/home/piyushk/rocon_catkin_ws/src/rocon_concert/concert_schedulers/src/concert_schedulers/compatibility_tree_scheduler/scheduler.py", line 118, in _ros_subscriber_concert_client_changes
    self._update()
  File "/home/piyushk/rocon_catkin_ws/src/rocon_concert/concert_schedulers/src/concert_schedulers/compatibility_tree_scheduler/scheduler.py", line 250, in _update
    reply.grant(resources)
  File "/home/piyushk/rocon_catkin_ws/src/concert_scheduling/concert_scheduler_requests/src/concert_scheduler_requests/transitions.py", line 306, in grant
    self._transition(EVENT_GRANT, reason=Request.NONE)
  File "/home/piyushk/rocon_catkin_ws/src/concert_scheduling/concert_scheduler_requests/src/concert_scheduler_requests/transitions.py", line 224, in _transition
    + ' in state ' + str(self.msg.status))
TransitionError: invalid event grant in state 5

Start app on allocated resources

Do you plan to start apps upon granting resources?

We talked about this fleetingly earlier. Probably no. of schedulers < no. of requesters so it would be good to centralise the start app code in the few schedulers we have.

I can hack on this and send you a PR if you like.

tests: simulate ROCON conductor for testing

This is needed to provide the scheduler with resources to allocate.

The mock conductor script must simulate resources starting and stopping, both randomly and on a known timeline.

concert_scheduler_requests: support requests using logical connectives

Migrated from utexas-bwi/rocon_scheduler_requests#8 from @stonier:

Current specificaiton specifies a group of resources required for the request:

Resource[] resources

which is simple, and works for alot of situation. Jack mentioned something along the line of making requests of requests. i.e. using logical connectives to form more particularly gnarly requests. e.g.

((two foo or one bar) and 3 baz) 

in which case, a currently specified list of resources such as five different robots would look like:

(a and b and c and d and e)

This is a nice idea and alot more powerful than just a list of resources. Does it really have many practical use cases though?

Would be nice to explore if we identify a need for adding this complexity in a future iteration so posting this as a kind of TODO marker.

concert_scheduler_requests: add request preemption timeout

Moved from utexas-bwi/rocon_scheduler_requests#28:

The current scheduler module allows a requester to hang on to a preempted resource indefinitely. That may be useful in some cases, but relies on all requesters to be responsive and well-behaved. While the scheduler protocol is intentionally co-operative, providing a timeout does seem desirable for cases where the requester is not working correctly.

We want a mechanism for the scheduler to allow a reasonable period of time for clean-up without tying up resources forever. Since a fixed time is unlikely to work for all situations, the requester and scheduler should negotiate how long to wait using the hold_time and availability fields. The scheduler's preempt() call would set availability to some reasonable time limit, and the requester could update hold_time if it has different requirements.

See utexas-bwi/rocon_scheduler_requests#10 for a longer discussion of some aspects involving lost resources.

scheduler_node : resource status via the requester feedback

Some background information can also be found in #23.

A requester currently has to track resource status (i.e. MISSING or not) in the resource_pool topic. While this is eminently doable, it would be convenient to have this information show up in the requester feedback function which is where a requester (I think) typically does alot of it's decision making.

The variable that gets passed back in the requester feedback function is the RequestSet which has resource information embedded in scheduler_msgs.Resouorce which would feel like a more natural place to get resource status feedback.

Note: this doesn't have to be done since we have a way of getting it already. What are your thoughts on it though Jack?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.