Coder Social home page Coder Social logo

dityas / protos Goto Github PK

View Code? Open in Web Editor NEW
10.0 3.0 1.0 222.08 MB

Factored Interactive POMDP solver based on symbolic Perseus.

License: GNU Affero General Public License v3.0

Java 97.68% ANTLR 0.65% Jupyter Notebook 1.67%
multiagent-planning pomdp ipomdp multiagent-systems probabilistic-planning

protos's Introduction

Protos

A factored I-POMDP solver

Protos is a factored IPOMDP solver developed at THINC Lab @ UGA. It uses Jesse Hoey's implementation of the symbolic Perseus and with some modifications, extends it to I-POMDPs.

❗ The solver has only yet been tested on Linux systems. Running it on Windows might cause problems because the paths are hardcoded in Linux format.

❗ The solver works on Java 15 and above. Please use a Coretto VM or a GraalVM build of JVM. This might seriously help with performance. I haven't performed any formal benchmarks though.

❗ Protos is constantly being developed and likely to undergo major changes as new features are added. If you are using this solver for research work, I would highly recommend cloning or forking the repo and having a local copy.

If you use this solver in your research, please cite the following paper.

Shinde, Aditya, Prashant Doshi, and Omid Setayeshfar. "Cyber Attack Intent
Recognition and Active Deception using Factored Interactive POMDPs." Proceedings
of the 20th International Conference on Autonomous Agents and MultiAgent
Systems. 2021.
@inproceedings{shinde2021cyber,
    title={Cyber Attack Intent Recognition and Active Deception using Factored
    Interactive POMDPs},
    author={Shinde, Aditya and Doshi, Prashant and Setayeshfar, Omid},
    booktitle={Proceedings of the 20th International Conference on Autonomous
    Agents and MultiAgent Systems},
    pages={1200--1208},
    year={2021}
}

Usage

Building the solver

Run the gradle wrapper

$ ./gradlew shadowJar

The solver should build to a JAR file build/libs/Protos-all.jar

Running the solver

java -jar Protos-all.jar -d <domain_file>

Domain file format

The I-POMDP domain file follows an extension of the SPUDD format for representing ADDs in plaintext. Currently, the solver can only solve 2-agent interactions.

Example

Here is the domain file for a level 2 multi-agent tiger problem. The file can be found in the domains directory and run directly to get a policy tree for the L2 tiger I-POMDP. The policy tree will be written to a file in dot format for graphviz

-- Lines beginning with '--' are single line comments

-- Any domain file should first define all the random variables that are going
-- to be used for the problem.
-- In this case, for the L2 multi-agent tiger problem, we will begin by defining
-- the physical state variable for the tiger location states.
(defvar TigerLoc (TL TR))

-- In the multi-agent tiger problem, there are two agents. Let's call them agent
-- I and agent J. Let's assume that J is at level 0 and level 2 and I at level 1
-- Let's define a model variable for agent I at L1 modeling J at L0. Let's begin
-- with 3 initial models. We will define these models later.
(defvar agent_j (m0 m1 m2))

-- Now, let's define the model variable for J at L2 modeling I at L1.
(defvar agent_il1 (m0 m1))

-- I-POMDPs allow agents to model opponents with different frames. These frames
-- can also be represented as random variables. In this example, we will assume
-- only a single frame. However, the solver does support multiple frames at all
-- levels.
(defvar Frame_jl1 (frame1))
(defvar Frame_il2 (frame1))

-- Add a random variable for agent I's actions
(defvar Agent_actions (OL OR L))

-- Another one for agent J's actions
(defvar Agentj_actions (OL OR L))

-- Random variable for J's observations
(defvar Growl_j (GL GR))
(defvar Creak_j (CL CR SL))

-- Agent I's observations
(defvar Growl (GL GR))
(defvar Creak (CL CR SL))


-- Initialize agent I's L1 model variable representing J's L0 models
-- The three models that we defined on top can now be populated with actual
-- beliefs. In this case, we start we 3 point beliefs. One uniform distribution,
-- and 2 beliefs with 85% probability of the tiger being on either side.
-- The format for defining models is,
-- (<frame_name>        (<model_name>   (<DD>)))
-- more details and tutorial will be provided in an accompanying tutorial. But
-- for now, here's the initialization of the L1 model variable.
(initmodelvar agent_j
    (frames Frame_jl1)
    (
	    (frame1 	(m0 	(0.5)))

	    (frame1 	(m1 	(
		    (TigerLoc 
			    (TL 	(0.85))
			    (TR 	(0.15))
		    )		
		)))

	    (frame1 	(m2 	(
                (TigerLoc
                    (TL         (0.15))
                    (TR         (0.85))
                )
            )))
	)
)


-- DD's for transitions, observations, beliefs and rewards
-- Common initial belief for all agents
(defdd initLoc
    (TigerLoc 	UNIFORM)
)

-- Transition if anyone opens door
(defdd openDoorDD
    (TigerLoc' UNIFORM)
)

-- J's L0 assumption over I's action distribution.
-- Since agent J at L0 cannot directly model agent I at L1, it will assume a
-- static distribution over I's possible actions.
(defdd aIDist
    (Agent_actions
	(OL 	(0.005))
	(OR 	(0.005))
	(L 	(0.99))
    )
)

-- J's observation function for listening to growls
(defdd growlObsDD
    (TigerLoc'
	(TL
    	    (Growl_j'
		(GL 	(0.85))
		(GR 	(0.15))
            )
	)
			
	(TR
	    (Growl_j'
		(GL 	(0.15))
		(GR 	(0.85))
	    )
	)
    )
)

-- I's observation function for listening to creaks
(defdd iListensCreak
    (Agentj_actions
	(OL
    	    (Creak'
		(CL 	(0.9))
		(CR 	(0.05))
		(SL 	(0.05))
	    )
	)
		
	(OR
	    (Creak'
		(CL 	(0.05))
		(CR 	(0.9))
		(SL 	(0.05))
	    )
	)
		
	(L
	    (Creak'
		(CL 	(0.05))
		(CR 	(0.05))
		(SL 	(0.9))
	    )
	)
    )
)

-- I's observation function for listening to growls
(defdd iListensGrowl
    (TigerLoc'
	(TL 	(Growl'	(GL	(0.85)) 	(GR 	(0.15))))
	(TR 	(Growl'	(GR	(0.85)) 	(GL 	(0.15))))
    )
)

-- I's joint action transition for the listen action
(defdd iListensDD
    (Agentj_actions
	(OL
            (TigerLoc' UNIFORM)
	)
		
	(OR
	    (TigerLoc' UNIFORM)
	)
		
	(L
	    (TigerLoc
		(TL 	(TigerLoc' TL))
		(TR 	(TigerLoc' TR))
	    )
	)
    )
	
)

-- J's joint action transition for the listen action
(defdd jListensDD
    (Agent_actions
	(OL 	(TigerLoc' UNIFORM))
	(OR 	(TigerLoc' UNIFORM))
	(L 		(SAME TigerLoc))
    )
)

-- For the transition and the observation functions, we use DBNs (Dynamic
-- Bayesian Networks). A detailed tutorial will explain how to define 
-- these custom DBNs, but for now, here's the DBN for the open action 
-- taken by agent I
-- I's DBN for open action
(defdbn actionIOpen
    (TigerLoc 	(TigerLoc' 	UNIFORM))
    (Growl 		(Growl' 	UNIFORM))
    (Creak 		(Creak' 	UNIFORM))	
)

-- I's DBN for listen action
(defdbn actionIListen
    (TigerLoc 	(iListensDD))
    (Growl 		(iListensGrowl))
    (Creak 		(iListensCreak))
)

-- J's DBNs
-- J's L0 DBN for opening doors
(defdbn actionOpenAny
    (TigerLoc 	(openDoorDD))
    (Growl_j 	(Growl_j' UNIFORM))
)

-- J's L0 DBN for listening
-- Notice that the Agent_actions var is summed out after multiplying with
-- assumed distribution of I's actions. This is mainly because at L0, agent J
-- does not explicitly model agent I
(defdbn actionL
    (TigerLoc 	(# (Agent_actions) (aIDist * jListensDD)))
    (Growl_j 	(growlObsDD))
)

-- J's level N DBN for opening doors
(defdbn actionJOpen
    (TigerLoc   (TigerLoc' UNIFORM))
    (Growl_j    (Growl_j' UNIFORM))
    (Creak_j    (Creak_j' UNIFORM))
)

-- J's level N DBN for listening

(defdd jListensCreakDD

    (Agent_actions
        (OL
            (Creak_j'
                (CL     (0.9))
                (CR     (0.05))
                (SL     (0.05))
            )
        )

        (OR
            (Creak_j'
                (CL     (0.05))
                (CR     (0.9))
                (SL     (0.05))
            )
        )

        (L
            (Creak_j'
                (CL     (0.05))
                (CR     (0.05))
                (SL     (0.9))
            )
        )
    )
)

(defdbn actionJListens
    (TigerLoc   (jListensDD))
    (Growl_j    (growlObsDD))
    (Creak_j    (jListensCreakDD))

)

-- Define agents
-- Agnet J L0
(defpomdp agentJ
    (S 	
        (TigerLoc)
    )

    (O
        (Growl_j)
    )

    (A 	Agentj_actions)
        
    (dynamics
        (OL 	(actionOpenAny))
        (OR 	(actionOpenAny))
        (L 		(actionL))
    )
       
    (R
        (OL 	(TigerLoc 
                    (TL 	(-100))
                    (TR 	(10))
        ))
        
        (OR		(TigerLoc
                    (TL 	(10))
                    (TR 	(-100))
        ))
            
        (L 		(-1))
    )
        
    (discount 0.9)	
)

-- Agent I L1
(defipomdp agentI
    (S 	
        (TigerLoc)
    )

    (O
        (Growl Creak)
    )

    (A 	Agent_actions)
    (Aj Agentj_actions)
    (Mj agent_j)
    
    (Thetaj Frame_jl1 	
    	(frame1 agentJ)
    )
        
    (dynamics
        (OL 	(actionIOpen))
        (OR 	(actionIOpen))
        (L 	(actionIListen))
    )
       
    (R
        (OL 	
            (TigerLoc 
                (TL 	(-100))
                (TR 	(10))
            )
        )
        
        (OR		
        			
            (TigerLoc
                (TR 	(-100))
                (TL 	(10))
            )
       	)
            
        (L 		(-1))
    )
        
    (discount 0.9)
    (H 4)	
)

-- Initialize models for L2 model var
(initmodelvar agent_il1
    (frames Frame_il2)
    (
	(frame1 	(m0 	(agent_j m0)))
	(frame1		(m1 	(agent_j m1)))
    )
)

-- Agent J L2
(defipomdp agentJl2
    (S 	
        (TigerLoc)
    )

    (O
        (Growl_j Creak_j)
    )

    (A 	Agentj_actions)
    (Aj Agent_actions)
    (Mj agent_il1)
    (Thetaj Frame_il2 	
    	(frame1 agentI)
    )
        
    (dynamics
        (OL 	(actionJOpen))
        (OR 	(actionJOpen))
        (L 		(actionJListens))
    )
       
    (R
        (OL 	
        	(TigerLoc 
                (TL 	(-100))
                (TR 	(10))
        	)
        )
        
        (OR		
        			
            (TigerLoc
                (TR 	(-100))
                (TL 	(10))
            )
       	)
            
        (L 		(-1))
    )
        
    (discount 0.9)
    (H 4)	
)

-- Initial belief of the L2 agent
(defdd initL2agentJ
    (
        (agent_il1
            (m0 (0.5))
            (m1 (0.5))
        )
        * (0.5)
    )
)

-- Solvers
(defpbvisolv ipomdp agentJl2Solver)

-- run the solver
-- To run the solver, the syntax is:
-- (defpol <policy_name> = solve <solver_name> (<list of initial beliefs>)
-- <agent_name> <backups> <depth of belief search>)

(run
    (defpol l2pol = solve agentJl2Solver ((initL2agentJ)) agentJl2 100 3)
    (poltree (initL2agentJ) agentJl2 l2pol 3)
)

protos's People

Contributors

dityas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

steixnerkumar

protos's Issues

DD HashSets not able to recognize same contents if Global hash tables are cleared.

Java's default hash computation does not work for DDs. The hashing probably includes class and object metadata and hence differ even when the DD values are same.

DD nextBelief1 = pomdp.beliefUpdate(pomdp.initialBelState, 1, new String[] {"GL"});
nextBelief1.display();
DD nextBelief2 = pomdp.beliefUpdate(nextBelief1, 1, new String[] {"GL"});
nextBelief2.display();
Global.clearHashtables();
DD nextBelief3 = pomdp.beliefUpdate(nextBelief2, 1, new String[] {"GR"});
nextBelief3.display();
assertFalse(Belief.checkEquals(nextBelief1, nextBelief2));
assertTrue(Belief.checkEquals(nextBelief1, nextBelief1));
assertTrue(Belief.checkEquals(nextBelief1, nextBelief3));
HashSet<DD> beliefSet = new HashSet<DD>();
beliefSet.add(nextBelief1);
System.out.println(beliefSet.contains(nextBelief2));
System.out.println(beliefSet.contains(nextBelief3));

gives

TigerLoc  {1}
   TL: 0.65  {}
   TR: 0.35  {}

TigerLoc  {1}
   TL: 0.7752293577981652  {}
   TR: 0.22477064220183482  {}

TigerLoc  {}
   TL: 0.65  {}
   TR: 0.35  {}

false
false

Implement Online opponent model expansion for lower level POMDP

The lower level POMDP's opponent does a full belief tree expansion after solving the POMDP. For domains with a high branching factor, this takes a lot of time to build the offline belief tree. Instead, context can be switched to POMDP online and expansion can be done for a fixed horizon. This would also save on memory by pruning subtrees with roots have beliefs with zero probabilities

Incorrect Probabilities assigned to observations when computing nextBelStates

During infinite horizon DP backups, the next belief states are computed in their factored form. While computing these beliefs for each observation, the sequence of observations used for computing the beliefs and for assigning the probabilities are different. This incorrectly assigns probability values to the beliefs

09 Dec 19 20:06:51 NextBelState [DEBUG]: [[1, 1], [2, 1], [1, 2], [2, 2], [1, 3], [2, 3]]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: Starting belief is (M_j
	(m/0_3
	(0.0))
	(m/1_2
	(0.0))
	(m/1_3
	(0.0))
	(m/0_1
	(0.0))
	(m/1_0
	(0.25))
	(m/0_2
	(0.0))
	(m/1_1
	(0.0))
	(m/0_0
	(0.25))
)
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: All possible combinations are [[growl-left, creak-left], [growl-left, creak-right], [growl-left, silence], [growl-right, creak-left], [growl-right, creak-right], [growl-right, silence]]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: Obs dist is DD [] (creak_P
  (creak-left(0.025))  (creak-right(0.025))  (silence(0.44999999999999996)))

09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: [0.025, 0.025, 0.025, 0.025, 0.44999999999999996, 0.44999999999999996]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: [2, 8, 2, 3, 2, 3, 2, 2, 8, 2, 3, 2, 3, 2]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=listen and o=[growl-left, creak-left] belief is [DD [] (tiger-location_P
  (tiger-left(0.8500000000000001))  (tiger-right(0.15)))
, DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.37250000000000005))  (m/0_2(0.0))  (m/0_3(0.1275))  (m/1_0(0.0))  (m/1_1(0.30250000000000005))  (m/1_2(0.0))  (m/1_3(0.1975)))
, DD [] (0.025)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=listen and o=[growl-left, creak-right] belief is [DD [] (tiger-location_P
  (tiger-left(0.15))  (tiger-right(0.8500000000000001)))
, DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.1275))  (m/0_2(0.0))  (m/0_3(0.37250000000000005))  (m/1_0(0.0))  (m/1_1(0.1975))  (m/1_2(0.0))  (m/1_3(0.30250000000000005)))
, DD [] (0.025)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=listen and o=[growl-left, silence] belief is [DD [] (tiger-location_P
  (tiger-left(0.8500000000000001))  (tiger-right(0.15)))
, DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.37250000000000005))  (m/0_2(0.0))  (m/0_3(0.1275))  (m/1_0(0.0))  (m/1_1(0.30250000000000005))  (m/1_2(0.0))  (m/1_3(0.1975)))
, DD [] (0.025)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=listen and o=[growl-right, creak-left] belief is [DD [] (tiger-location_P
  (tiger-left(0.15))  (tiger-right(0.8500000000000001)))
, DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.1275))  (m/0_2(0.0))  (m/0_3(0.37250000000000005))  (m/1_0(0.0))  (m/1_1(0.1975))  (m/1_2(0.0))  (m/1_3(0.30250000000000005)))
, DD [] (0.025)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=listen and o=[growl-right, creak-right] belief is [DD [] (tiger-location_P
  (tiger-left(0.8500000000000001))  (tiger-right(0.15000000000000002)))
, DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.3725))  (m/0_2(0.0))  (m/0_3(0.1275))  (m/1_0(0.0))  (m/1_1(0.3025))  (m/1_2(0.0))  (m/1_3(0.1975)))
, DD [] (0.45)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=listen and o=[growl-right, silence] belief is [DD [] (tiger-location_P
  (tiger-left(0.15000000000000002))  (tiger-right(0.8500000000000001)))
, DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.1275))  (m/0_2(0.0))  (m/0_3(0.3725))  (m/1_0(0.0))  (m/1_1(0.1975))  (m/1_2(0.0))  (m/1_3(0.3025)))
, DD [] (0.45)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-left and o=[growl-left, creak-left] belief is [DD [] (0.5), DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.25))  (m/0_2(0.0))  (m/0_3(0.25))  (m/1_0(0.0))  (m/1_1(0.25))  (m/1_2(0.0))  (m/1_3(0.25)))
, DD [] (0.165)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-left and o=[growl-left, creak-right] belief is [DD [] (0.5), DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.25))  (m/0_2(0.0))  (m/0_3(0.25))  (m/1_0(0.0))  (m/1_1(0.25))  (m/1_2(0.0))  (m/1_3(0.25)))
, DD [] (0.165)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-left and o=[growl-left, silence] belief is [DD [] (0.5), DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.25))  (m/0_2(0.0))  (m/0_3(0.25))  (m/1_0(0.0))  (m/1_1(0.25))  (m/1_2(0.0))  (m/1_3(0.25)))
, DD [] (0.17)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-left and o=[growl-right, creak-left] belief is [DD [] (0.5), DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.25))  (m/0_2(0.0))  (m/0_3(0.25))  (m/1_0(0.0))  (m/1_1(0.25))  (m/1_2(0.0))  (m/1_3(0.25)))
, DD [] (0.17)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-left and o=[growl-right, creak-right] belief is [DD [] (0.5), DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.25))  (m/0_2(0.0))  (m/0_3(0.25))  (m/1_0(0.0))  (m/1_1(0.25))  (m/1_2(0.0))  (m/1_3(0.25)))
, DD [] (0.165)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-left and o=[growl-right, silence] belief is [DD [] (0.5), DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.25))  (m/0_2(0.0))  (m/0_3(0.25))  (m/1_0(0.0))  (m/1_1(0.25))  (m/1_2(0.0))  (m/1_3(0.25)))
, DD [] (0.165)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-right and o=[growl-left, creak-left] belief is [DD [] (0.5), DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.25))  (m/0_2(0.0))  (m/0_3(0.25))  (m/1_0(0.0))  (m/1_1(0.25))  (m/1_2(0.0))  (m/1_3(0.25)))
, DD [] (0.165)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-right and o=[growl-left, creak-right] belief is [DD [] (0.5), DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.25))  (m/0_2(0.0))  (m/0_3(0.25))  (m/1_0(0.0))  (m/1_1(0.25))  (m/1_2(0.0))  (m/1_3(0.25)))
, DD [] (0.165)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-right and o=[growl-left, silence] belief is [DD [] (0.5), DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.25))  (m/0_2(0.0))  (m/0_3(0.25))  (m/1_0(0.0))  (m/1_1(0.25))  (m/1_2(0.0))  (m/1_3(0.25)))
, DD [] (0.17)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-right and o=[growl-right, creak-left] belief is [DD [] (0.5), DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.25))  (m/0_2(0.0))  (m/0_3(0.25))  (m/1_0(0.0))  (m/1_1(0.25))  (m/1_2(0.0))  (m/1_3(0.25)))
, DD [] (0.17)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-right and o=[growl-right, creak-right] belief is [DD [] (0.5), DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.25))  (m/0_2(0.0))  (m/0_3(0.25))  (m/1_0(0.0))  (m/1_1(0.25))  (m/1_2(0.0))  (m/1_3(0.25)))
, DD [] (0.165)]
09 Dec 19 20:06:51 TestOnlineSymbolicPerseus [DEBUG]: For Ai=open-right and o=[growl-right, silence] belief is [DD [] (0.5), DD [] (M_j_P
  (m/0_0(0.0))  (m/0_1(0.25))  (m/0_2(0.0))  (m/0_3(0.25))  (m/1_0(0.0))  (m/1_1(0.25))  (m/1_2(0.0))  (m/1_3(0.25)))
, DD [] (0.165)]

DD for action costs

The current ActionSPUDD objects use a float to represent to costs for performing the actions. However, costs can also be dependent on the state that the agent ends up in. A cost DD should also be added.

IPOMDP solve for level 0

The IPOMDP solve method for level 0 will be different than the POMDP solver method since it will use joint action spaces.

Split the Mj and Aj factors in the L1 belief update

The current belief update equation merges the Mj and Aj nodes into a single Mj x Aj node. This does not allow i to consider j's non optimal actions and also ends up un factoring the variables. Additionally Mj x Aj transition, observation and reward functions have to be re created after each belief update. This makes it very inefficient and takes a long time for high branching factors of the belief tree/ policy tree. Instead, Mj and Aj can be split, and only the factor P(Aj|Mj) will have to be re built after each update.

Fix symbolic perseus for solving L1 domains with joint action spaces

The current POMDP implementation of the symbolic perseus solver uses a POMDP belief update. But it looks like it can be tweaked to implement L1 belief update instead and compute the optimal policy for the given belief points. This will be quicker than writing the L1 value iteration from scratch.

Make globals non static and maintain them as an object for each frame.

The current globals are static and many classes like OP rely on them. This works well while solving a single POMDP. But for IPOMDPs, where frames are constantly being switched and belef updates are being made for different frames, setting and unsetting the proper globals becomes to complicated. Also any bug in doing so will give incorrect results for the OP functions.

Multiple frames for L0

The solver does not support multiple types of agents at lower level with different actions and observation functions and variables.

Determine convergence criteria in case the L1 solver does not reach fixed convergence

The L1 solver uses PBVI and Symbolic Perseus. These are approximate solvers and convergence is not always guaranteed. This is evident while solving the tiger problem for greater time steps with a limited horizon belief expansion. However, the solver does oscillate between a few values of bellman error. This can be used a criteria to declare approximate convergence.

Use the full Belief Tree to track opponent instead of the policy tree

Current implementation uses a policy tree to track Mj and Aj. This is restrictive since it does not allow for tracking non optimal actions taken by the opponent. Using a belief tree instead and greedily using optimal actions for agent j with a small exploration probability for other actions will create a more flexible model of the opponent

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.