utexas-bwi / concert_scheduling Goto Github PK
View Code? Open in Web Editor NEWScheduler support packages for the Robotics in Concert project
Home Page: http://wiki.ros.org/concert_scheduling
Scheduler support packages for the Robotics in Concert project
Home Page: http://wiki.ros.org/concert_scheduling
The current logic only rejects an invalid request when it reaches the head of the queue.
Some problems can only be found at allocation time, but others could be rejected earlier.
In the Scheduler
and Requester
constructors: when topic
is not specified, search for a topic_name
ROS parameter, using rocon_scheduler
if none found.
The current stub only copies the exact resource names requested.
More work remains for full resource matching in the pool.
if hasattr(Request, "INVALID"): # new reason code defined?
element.request.cancel(Request.INVALID)
Planning or have already implemented this?
Need a test case that demonstrates this bug.
We can live with it for a while, because the scheduler is not currently changing priorities.
Seems to be a problem with a particular flow of events in the scheduler:
I have an example in which the following happens.
self.requesters[rqr_id] = _RequesterStatus(self, msg)
concert_simple_scheduler
's callback.self.requesters[requester_id].send_feedback()
which doesn't exist yet because we actually haven't got through the construction of _RequesterStatus yet nor added it to the dictionary.
Not sure where you'd like to tackle a fix for this. If you need instructions for a reproducible example, let me know.
If a new resource is discovered on concert_client_changes
, it doesn't proceed to allocate that resource and finally grant a request that is waiting.
I'm digging around for more information now and will update back here.
The peek() method would return the head of the queue without removing it.
This is a carry over from utexas-bwi/rocon_scheduler_requests#10. I'll try to crystallise what was there and then raise the issue.
Summary
Current Status
Resources are currently tracked with a status flag with AVAILABLE
, ALLOCATED
and MISSING
values. If a resource disappears from the conductor's /concert/conductor/concert_clients
publication, then it is flagged as MISSING
and published in turn on /concert/scheduler/resource_pool
.
Issues
left
and lost
The first is when the concert client has left the concert. If this happens and it does come back, it will come back under a different rocon uri (probably with a postfixed counter to the name, e.g. kobuki2 instead of kobuki). The second when it just loses its network connection to the concert and will come back at some point.
The conductors' concert_clients
topic provides the necessary information for both cases.
When a client has left
, it just disappears from the topic's list of clients and this is the case that is handled in the resource pool update.
When a client is lost
, it will (have a bit of work to do this week on this) get shown in the client_status
variable which indicates its current connectivity, e.g.
clients:
-
name: kobuki
gateway_name: kobukib093a49d98e747cd9ef2f1cb337408f3
platform_info:
uri: rocon:/pc/kobuki/hydro/precise
version: acdc
icon:
resource_name: ''
format: png
client_status: connected
app_status: running
Questions
left
clients stay in the resource pool? This is important for tracking an individually left resource in a request containing multiple resources.The documentation link you have in the readme is probably going to the correct url:
http://farnsworth.csres.utexas.edu/docs/concert_simple_scheduler/html/
but looks as though you've accidentally uploaded the rocon_request_scheduler docs there.
The current implementation still uses the Python re
package to match requests to resources.
The scheduler will grant a request, but in the request feedback back to the requester, the pre-grant status of the request is received (i.e. the grant update did not take effect).
I added some logging to get a handle on what is going on - the most interesting point of which is in the dispatch function. Passing it the rset variable via the callback method, i.e.
def callback(self, rset):
rospy.logdebug('scheduler callback:')
for rq in rset.values():
rospy.logwarn("DJS : request address in callback [%s]" % hex(id(rq)))
rospy.logdebug(' ' + str(rq))
if rq.msg.status == Request.NEW:
self.queue(rq, rset.requester_id)
elif rq.msg.status == Request.CANCELING:
self.free(rq, rset.requester_id)
self.dispatch(rset) # try to allocate ready requests
def dispatch(self, rset=None):
while len(self.ready_queue) > 0:
# Try to allocate top element in the ready queue.
elem = self.ready_queue.pop()
rospy.logwarn("DJS: elem -> address [%s]" % hex(id(elem.request)))
rospy.logwarn("DJS: elem -> id [%s]" % unique_id.toHexString(elem.request.msg.id))
rospy.logwarn("DJS: elem -> status [%s]" % elem.request.msg.status)
resources = []
try:
resources = self.pool.allocate(elem.request)
except InvalidRequestError as ex:
self.reject_request(elem, ex)
continue # skip to next queue element
if not resources: # top request cannot be satisfied?
# Return it to head of queue.
self.ready_queue.add(elem)
break # stop looking
try:
elem.request.grant(resources)
rospy.logwarn("DJS: dispatch -> address [%s]" % hex(id(elem.request)))
rospy.logwarn("DJS: dispatch -> msg address [%s]" % hex(id(elem.request.msg)))
rospy.logwarn("DJS: dispatch -> id [%s]" % unique_id.toHexString(elem.request.msg.id))
rospy.logwarn("DJS: dispatch -> status [%s]" % elem.request.msg.status)
if rset is not None:
for rq in rset.values():
rospy.logwarn("DJS: rq -> address [%s]" % hex(id(rq)))
rospy.logwarn("DJS: rq -> msg address [%s]" % hex(id(rq.msg)))
rospy.logwarn("DJS: rq -> id [%s]" % unique_id.toHexString(rq.msg.id))
rospy.logwarn("DJS: rq -> status [%s]" % rq.msg.status)
rospy.loginfo(
'Request granted: ' + str(elem.request.uuid))
except TransitionError: # request no longer active?
# Return allocated resources to the pool.
self.pool.release_resources(resources)
self.notification_set.add(elem.requester_id)
I get this output:
[WARN] [WallTime: 1396153452.730991] DJS : request address in callback [0x1f18490]
[WARN] [WallTime: 1396153452.731454] DJS: transitions [w] -> address [0x1f18490]
[WARN] [WallTime: 1396153452.731776] DJS: transitions [w] -> id [10d4e332-4e90-40f4-8f79-57c8bc098182]
[WARN] [WallTime: 1396153452.732043] DJS: transitions [w] -> grant [2]
[WARN] [WallTime: 1396153452.732398] DJS: transitions [w] -> resources [[rapp: rocon_apps/teleop
id:
uuid: [54, 104, 244, 114, 150, 196, 79, 31, 169, 128, 168, 115, 67, 248, 50, 146]
uri: rocon:/pc/guimul/hydro/precise
remappings: []]]
[INFO] [WallTime: 1396153452.733212] Request queued: 10d4e332-4e90-40f4-8f79-57c8bc098182
[WARN] [WallTime: 1396153452.733482] DJS: elem -> address [0x1f185d0]
[WARN] [WallTime: 1396153452.733764] DJS: elem -> id [10d4e332-4e90-40f4-8f79-57c8bc098182]
[WARN] [WallTime: 1396153452.734061] DJS: elem -> status [2]
[WARN] [WallTime: 1396153452.734742] DJS: transitions [g] -> address [0x1f185d0]
[WARN] [WallTime: 1396153452.735097] DJS: transitions [g] -> id [10d4e332-4e90-40f4-8f79-57c8bc098182]
[WARN] [WallTime: 1396153452.735369] DJS: transitions [g] -> grant [3]
[WARN] [WallTime: 1396153452.735695] DJS: transitions [g] -> resources [[rapp: rocon_apps/teleop
id:
uuid: [54, 104, 244, 114, 150, 196, 79, 31, 169, 128, 168, 115, 67, 248, 50, 146]
uri: rocon:/pc/guimul/hydro/precise
remappings: []]]
[WARN] [WallTime: 1396153452.736015] DJS: dispatch -> address [0x1f185d0]
[WARN] [WallTime: 1396153452.736313] DJS: dispatch -> msg address [0x1eedb40]
[WARN] [WallTime: 1396153452.736619] DJS: dispatch -> id [10d4e332-4e90-40f4-8f79-57c8bc098182]
[WARN] [WallTime: 1396153452.736906] DJS: dispatch -> status [3]
[WARN] [WallTime: 1396153452.737174] DJS: rq -> address [0x1f18490]
[WARN] [WallTime: 1396153452.737463] DJS: rq -> msg address [0x1eed830]
[WARN] [WallTime: 1396153452.737759] DJS: rq -> id [10d4e332-4e90-40f4-8f79-57c8bc098182]
[WARN] [WallTime: 1396153452.738022] DJS: rq -> status [2]
[INFO] [WallTime: 1396153452.738291] Request granted: 10d4e332-4e90-40f4-8f79-57c8bc098182
[WARN] [WallTime: 1396153452.738597] DJS: sending -> address [0x1f18490]
[WARN] [WallTime: 1396153452.738933] DJS: sending -> id 10d4e332-4e90-40f4-8f79-57c8bc098182
[WARN] [WallTime: 1396153452.739228] DJS: sending -> status 2
[WARN] [WallTime: 1396153452.739579] DJS: sending -> resources [rapp: rocon_apps/teleop
id:
uuid: [54, 104, 244, 114, 150, 196, 79, 31, 169, 128, 168, 115, 67, 248, 50, 146]
uri: rocon:/pc/guimul/hydro/precise
remappings: []]
[WARN] [WallTime: 1396153452.740037] DJS: resource pool changed
[WARN] [WallTime: 1396153452.740379] DJS: publishing known resources
[WARN] [WallTime: 1396153452.740843] DJS: requester feedback - request id 10d4e332-4e90-40f4-8f79-57c8bc098182
[WARN] [WallTime: 1396153452.741350] DJS: requester feedback - request status 2
[WARN] [WallTime: 1396153452.741736] DJS: requester feedback - request resources [rapp: rocon_apps/teleop
As you can see, the rset requets and the popped ready_queue request are not the same thing. So any grant()
operation on the popped request is irrelevant. I suspect that they are supposed to be one and the same object?
It should be possible to provide different scheduling policies with minimal code by using derived classes, but for that to work cleanly it needs to be kept in mind as a goal.
Maybe this was caused by commenting out the deep copy in 36f16bc.
@jack-oquin Any chance you could do a deb release for this? I could also do this for you along with our other rocon releases too if you like (lets you just worry about the sources).
I'm afraid the error message is a bit too cryptic for me to explain. And I can't reproduce this error with certainty.
[ERROR] [WallTime: 1398033132.501574] bad callback: <bound method CompatibilityTreeScheduler._ros_subscriber_concert_client_changes of <concert_schedulers.compatibility_tree_scheduler.scheduler.CompatibilityTreeScheduler object at 0x17fd848>>
Traceback (most recent call last):
File "/opt/ros/hydro/lib/python2.7/dist-packages/rospy/topics.py", line 682, in _invoke_callback
cb(msg)
File "/home/piyushk/rocon_catkin_ws/src/rocon_concert/concert_schedulers/src/concert_schedulers/compatibility_tree_scheduler/scheduler.py", line 118, in _ros_subscriber_concert_client_changes
self._update()
File "/home/piyushk/rocon_catkin_ws/src/rocon_concert/concert_schedulers/src/concert_schedulers/compatibility_tree_scheduler/scheduler.py", line 250, in _update
reply.grant(resources)
File "/home/piyushk/rocon_catkin_ws/src/concert_scheduling/concert_scheduler_requests/src/concert_scheduler_requests/transitions.py", line 306, in grant
self._transition(EVENT_GRANT, reason=Request.NONE)
File "/home/piyushk/rocon_catkin_ws/src/concert_scheduling/concert_scheduler_requests/src/concert_scheduler_requests/transitions.py", line 224, in _transition
+ ' in state ' + str(self.msg.status))
TransitionError: invalid event grant in state 5
Do you plan to start apps upon granting resources?
We talked about this fleetingly earlier. Probably no. of schedulers < no. of requesters
so it would be good to centralise the start app code in the few schedulers we have.
I can hack on this and send you a PR if you like.
The current implementation never preempts resources it previously allocated, even when a higher-priority request is waiting on it.
This is needed to provide the scheduler with resources to allocate.
The mock conductor script must simulate resources starting and stopping, both randomly and on a known timeline.
Migrated from utexas-bwi/rocon_scheduler_requests#8 from @stonier:
Current specificaiton specifies a group of resources required for the request:
Resource[] resources
which is simple, and works for alot of situation. Jack mentioned something along the line of making requests of requests. i.e. using logical connectives to form more particularly gnarly requests. e.g.
((two foo or one bar) and 3 baz)
in which case, a currently specified list of resources such as five different robots would look like:
(a and b and c and d and e)
This is a nice idea and alot more powerful than just a list of resources. Does it really have many practical use cases though?
Would be nice to explore if we identify a need for adding this complexity in a future iteration so posting this as a kind of TODO marker.
There are legitimate reasons to allocate the big scheduler lock at a higher level, where other threads may be involved. The big requester lock should work similarly.
Moved from utexas-bwi/rocon_scheduler_requests#28:
The current scheduler module allows a requester to hang on to a preempted resource indefinitely. That may be useful in some cases, but relies on all requesters to be responsive and well-behaved. While the scheduler protocol is intentionally co-operative, providing a timeout does seem desirable for cases where the requester is not working correctly.
We want a mechanism for the scheduler to allow a reasonable period of time for clean-up without tying up resources forever. Since a fixed time is unlikely to work for all situations, the requester and scheduler should negotiate how long to wait using the hold_time
and availability
fields. The scheduler's preempt() call would set availability
to some reasonable time limit, and the requester could update hold_time if it has different requirements.
See utexas-bwi/rocon_scheduler_requests#10 for a longer discussion of some aspects involving lost resources.
Some background information can also be found in #23.
A requester currently has to track resource status (i.e. MISSING or not) in the resource_pool
topic. While this is eminently doable, it would be convenient to have this information show up in the requester feedback function which is where a requester (I think) typically does alot of it's decision making.
The variable that gets passed back in the requester feedback function is the RequestSet which has resource information embedded in scheduler_msgs.Resouorce
which would feel like a more natural place to get resource status feedback.
Note: this doesn't have to be done since we have a way of getting it already. What are your thoughts on it though Jack?
cross-reference: utexas-bwi/rocon_scheduler_requests#9
The likely solution involves publishing a new message whenever the resource_pool changes.
I am considering /schedulable_resources or /schedulable_rocon_resources for the default topic name. The first one seems better.
There is a use case for including the request priority
of ALLOCATED
resources. See this earlier discussion.
Otherwise, a high-priority request for resources that are not available will block everything that follows.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.