dityas / protos Goto Github PK
View Code? Open in Web Editor NEWFactored Interactive POMDP solver based on symbolic Perseus.
License: GNU Affero General Public License v3.0
Factored Interactive POMDP solver based on symbolic Perseus.
License: GNU Affero General Public License v3.0
Java's default hash computation does not work for DDs. The hashing probably includes class and object metadata and hence differ even when the DD values are same.
DD nextBelief1 = pomdp.beliefUpdate(pomdp.initialBelState, 1, new String[] {"GL"});
nextBelief1.display();
DD nextBelief2 = pomdp.beliefUpdate(nextBelief1, 1, new String[] {"GL"});
nextBelief2.display();
Global.clearHashtables();
DD nextBelief3 = pomdp.beliefUpdate(nextBelief2, 1, new String[] {"GR"});
nextBelief3.display();
assertFalse(Belief.checkEquals(nextBelief1, nextBelief2));
assertTrue(Belief.checkEquals(nextBelief1, nextBelief1));
assertTrue(Belief.checkEquals(nextBelief1, nextBelief3));
HashSet<DD> beliefSet = new HashSet<DD>();
beliefSet.add(nextBelief1);
System.out.println(beliefSet.contains(nextBelief2));
System.out.println(beliefSet.contains(nextBelief3));
gives
TigerLoc {1}
TL: 0.65 {}
TR: 0.35 {}
TigerLoc {1}
TL: 0.7752293577981652 {}
TR: 0.22477064220183482 {}
TigerLoc {}
TL: 0.65 {}
TR: 0.35 {}
false
false
The lower level POMDP's opponent does a full belief tree expansion after solving the POMDP. For domains with a high branching factor, this takes a lot of time to build the offline belief tree. Instead, context can be switched to POMDP online and expansion can be done for a fixed horizon. This would also save on memory by pruning subtrees with roots have beliefs with zero probabilities
The current ActionSPUDD objects use a float to represent to costs for performing the actions. However, costs can also be dependent on the state that the agent ends up in. A cost DD should also be added.
The frame is being forcefully cast to a POMDP and then checked if it is an instanceof
POMDP. This will always return true and break if lower frames are IPOMDPs as in case of level 2 domains
During infinite horizon DP backups, the next belief states are computed in their factored form. While computing these beliefs for each observation, the sequence of observations used for computing the beliefs and for assigning the probabilities are different. This incorrectly assigns probability values to the beliefs
09 Dec 19 20:06:51 NextBelState [DEBUG]: [[1, 1], [2, 1], [1, 2], [2, 2], [1, 3], [2, 3]]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: Starting belief is (M_j
(m/0_3
(0.0))
(m/1_2
(0.0))
(m/1_3
(0.0))
(m/0_1
(0.0))
(m/1_0
(0.25))
(m/0_2
(0.0))
(m/1_1
(0.0))
(m/0_0
(0.25))
)
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: All possible combinations are [[growl-left, creak-left], [growl-left, creak-right], [growl-left, silence], [growl-right, creak-left], [growl-right, creak-right], [growl-right, silence]]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: Obs dist is DD [] (creak_P
(creak-left(0.025)) (creak-right(0.025)) (silence(0.44999999999999996)))
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: [0.025, 0.025, 0.025, 0.025, 0.44999999999999996, 0.44999999999999996]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: [2, 8, 2, 3, 2, 3, 2, 2, 8, 2, 3, 2, 3, 2]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=listen and o=[growl-left, creak-left] belief is [DD [] (tiger-location_P
(tiger-left(0.8500000000000001)) (tiger-right(0.15)))
, DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.37250000000000005)) (m/0_2(0.0)) (m/0_3(0.1275)) (m/1_0(0.0)) (m/1_1(0.30250000000000005)) (m/1_2(0.0)) (m/1_3(0.1975)))
, DD [] (0.025)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=listen and o=[growl-left, creak-right] belief is [DD [] (tiger-location_P
(tiger-left(0.15)) (tiger-right(0.8500000000000001)))
, DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.1275)) (m/0_2(0.0)) (m/0_3(0.37250000000000005)) (m/1_0(0.0)) (m/1_1(0.1975)) (m/1_2(0.0)) (m/1_3(0.30250000000000005)))
, DD [] (0.025)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=listen and o=[growl-left, silence] belief is [DD [] (tiger-location_P
(tiger-left(0.8500000000000001)) (tiger-right(0.15)))
, DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.37250000000000005)) (m/0_2(0.0)) (m/0_3(0.1275)) (m/1_0(0.0)) (m/1_1(0.30250000000000005)) (m/1_2(0.0)) (m/1_3(0.1975)))
, DD [] (0.025)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=listen and o=[growl-right, creak-left] belief is [DD [] (tiger-location_P
(tiger-left(0.15)) (tiger-right(0.8500000000000001)))
, DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.1275)) (m/0_2(0.0)) (m/0_3(0.37250000000000005)) (m/1_0(0.0)) (m/1_1(0.1975)) (m/1_2(0.0)) (m/1_3(0.30250000000000005)))
, DD [] (0.025)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=listen and o=[growl-right, creak-right] belief is [DD [] (tiger-location_P
(tiger-left(0.8500000000000001)) (tiger-right(0.15000000000000002)))
, DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.3725)) (m/0_2(0.0)) (m/0_3(0.1275)) (m/1_0(0.0)) (m/1_1(0.3025)) (m/1_2(0.0)) (m/1_3(0.1975)))
, DD [] (0.45)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=listen and o=[growl-right, silence] belief is [DD [] (tiger-location_P
(tiger-left(0.15000000000000002)) (tiger-right(0.8500000000000001)))
, DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.1275)) (m/0_2(0.0)) (m/0_3(0.3725)) (m/1_0(0.0)) (m/1_1(0.1975)) (m/1_2(0.0)) (m/1_3(0.3025)))
, DD [] (0.45)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-left and o=[growl-left, creak-left] belief is [DD [] (0.5), DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.25)) (m/0_2(0.0)) (m/0_3(0.25)) (m/1_0(0.0)) (m/1_1(0.25)) (m/1_2(0.0)) (m/1_3(0.25)))
, DD [] (0.165)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-left and o=[growl-left, creak-right] belief is [DD [] (0.5), DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.25)) (m/0_2(0.0)) (m/0_3(0.25)) (m/1_0(0.0)) (m/1_1(0.25)) (m/1_2(0.0)) (m/1_3(0.25)))
, DD [] (0.165)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-left and o=[growl-left, silence] belief is [DD [] (0.5), DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.25)) (m/0_2(0.0)) (m/0_3(0.25)) (m/1_0(0.0)) (m/1_1(0.25)) (m/1_2(0.0)) (m/1_3(0.25)))
, DD [] (0.17)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-left and o=[growl-right, creak-left] belief is [DD [] (0.5), DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.25)) (m/0_2(0.0)) (m/0_3(0.25)) (m/1_0(0.0)) (m/1_1(0.25)) (m/1_2(0.0)) (m/1_3(0.25)))
, DD [] (0.17)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-left and o=[growl-right, creak-right] belief is [DD [] (0.5), DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.25)) (m/0_2(0.0)) (m/0_3(0.25)) (m/1_0(0.0)) (m/1_1(0.25)) (m/1_2(0.0)) (m/1_3(0.25)))
, DD [] (0.165)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-left and o=[growl-right, silence] belief is [DD [] (0.5), DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.25)) (m/0_2(0.0)) (m/0_3(0.25)) (m/1_0(0.0)) (m/1_1(0.25)) (m/1_2(0.0)) (m/1_3(0.25)))
, DD [] (0.165)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-right and o=[growl-left, creak-left] belief is [DD [] (0.5), DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.25)) (m/0_2(0.0)) (m/0_3(0.25)) (m/1_0(0.0)) (m/1_1(0.25)) (m/1_2(0.0)) (m/1_3(0.25)))
, DD [] (0.165)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-right and o=[growl-left, creak-right] belief is [DD [] (0.5), DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.25)) (m/0_2(0.0)) (m/0_3(0.25)) (m/1_0(0.0)) (m/1_1(0.25)) (m/1_2(0.0)) (m/1_3(0.25)))
, DD [] (0.165)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-right and o=[growl-left, silence] belief is [DD [] (0.5), DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.25)) (m/0_2(0.0)) (m/0_3(0.25)) (m/1_0(0.0)) (m/1_1(0.25)) (m/1_2(0.0)) (m/1_3(0.25)))
, DD [] (0.17)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-right and o=[growl-right, creak-left] belief is [DD [] (0.5), DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.25)) (m/0_2(0.0)) (m/0_3(0.25)) (m/1_0(0.0)) (m/1_1(0.25)) (m/1_2(0.0)) (m/1_3(0.25)))
, DD [] (0.17)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-right and o=[growl-right, creak-right] belief is [DD [] (0.5), DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.25)) (m/0_2(0.0)) (m/0_3(0.25)) (m/1_0(0.0)) (m/1_1(0.25)) (m/1_2(0.0)) (m/1_3(0.25)))
, DD [] (0.165)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-right and o=[growl-right, silence] belief is [DD [] (0.5), DD [] (M_j_P
(m/0_0(0.0)) (m/0_1(0.25)) (m/0_2(0.0)) (m/0_3(0.25)) (m/1_0(0.0)) (m/1_1(0.25)) (m/1_2(0.0)) (m/1_3(0.25)))
, DD [] (0.165)]
The current POMDP implementation of the symbolic perseus solver uses a POMDP belief update. But it looks like it can be tweaked to implement L1 belief update instead and compute the optimal policy for the given belief points. This will be quicker than writing the L1 value iteration from scratch.
The current globals are static and many classes like OP rely on them. This works well while solving a single POMDP. But for IPOMDPs, where frames are constantly being switched and belef updates are being made for different frames, setting and unsetting the proper globals becomes to complicated. Also any bug in doing so will give incorrect results for the OP functions.
The solver does not support multiple types of agents at lower level with different actions and observation functions and variables.
The higher level domain should solve the POMDP model with a single call to the Domain.solve method.
Create smooth subclassing for L1 domains based on lower levels of the adversary
6f752f1 finishes Oj extraction and conditioning the DD on Mj for computing j's belief update. However, this only works for a single frame at the moment.
Current implementation uses a policy tree to track Mj and Aj. This is restrictive since it does not allow for tracking non optimal actions taken by the opponent. Using a belief tree instead and greedily using optimal actions for agent j with a small exploration probability for other actions will create a more flexible model of the opponent
The IPOMDP solve method for level 0 will be different than the POMDP solver method since it will use joint action spaces.
The L1 solver uses PBVI and Symbolic Perseus. These are approximate solvers and convergence is not always guaranteed. This is evident while solving the tiger problem for greater time steps with a limited horizon belief expansion. However, the solver does oscillate between a few values of bellman error. This can be used a criteria to declare approximate convergence.
Use the belief update equation for generating belief updates considering opponents actions at level 0
The current belief update equation merges the Mj
and Aj
nodes into a single Mj x Aj
node. This does not allow i
to consider j
's non optimal actions and also ends up un factoring the variables. Additionally Mj x Aj
transition, observation and reward functions have to be re created after each belief update. This makes it very inefficient and takes a long time for high branching factors of the belief tree/ policy tree. Instead, Mj
and Aj
can be split, and only the factor P(Aj|Mj)
will have to be re built after each update.
The current POMDP objects can be considered as level 0 frames. These have to be subclassed to a level 1 IPOMDP class which can hold POMDP frames of other agents.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.