Robust Deep RL with Soft Actor Critic approach

Robust Deep RL with a Soft Actor-Critic approach with adversarial perturbation on state observations

I designed new Robust Deep RL with a Soft Actor-Critic approach with adversarial perturbation on state observations. My work is based on SA-MDP, which is proposed by Zhang et al. (2020). For more detailed explanation, please check attached pdf file. **2022 Spring Semester, Personal Project Research _Kyungphil Park

SA-MDP(State Adversarial-MDP)

SA-MDP assumes that the fixed-adversarial attack is the situation of the worst-case with the most minimized Q value following equations, and Zhang et al. (2020) newly define it as a SA-MDP. **Zhang et al. (2020)

SA-SAC Regularizer

SA-SAC

In our work, we need to solve a minimax problem: minimizing the policy loss for a worst case

object function

Codes

I designed Robust Deep RL with a soft actor critic approach in discrete action space. I tested SA-SAC in a several atari gym environments. SAC codes are based on the **bernomone's github codes.

Train SA-SAC agent

At first, make new three directories saved_models, vidoes and Logs.

Before you start training, set n_steps, memory_size, train_start, reg_train_start … at the config01.json file.
n_steps : total nubmer of steps you want to train.
memory_size: buffer memory size
train_start: number of steps when training begins.
reg_train_start: number of steps when training with SA-Regularizer begins.

train.py (train vanilla SAC)

train.py 
	--config=config01.json(default)
	--new=1(default) # set 0 when you load pretrained models
 	--game=BeamRider(default) # set any atari game environment

example: python train.py , python [train.py](http://train.py) —game=Assault

robust_train.py (train SA-SAC)

robust_train.py 
	--config=config01.json(default)
	--new=1(default) # set 0 when you load pretrained models
 	--game=BeamRider(default) # set any atari game environment

example: python robust_train.py , python robust_[train.py](http://train.py) —game=Assault

generate_match_video.py

render atari game video with your trained models.

generate_match_video.py
	--config=config01.json(default)
	--seed=0(default)
  	--game=BeamRider(default) # set any atari game environment 
  	--random=False(default) # set 1 when you want to test random action.

example: python generate_match_video.py, python generate_match_video[.py](http://train.py) —game=Assault --random=1

PGD_generate_video.py

(+ PGD attack(adversarial perturbation on state observation)

render atari game video with your trained models

PGD_generate_video.py
	--config=config01.json(default)
	--seed=0(default)
	--game=BeamRider(default) # set any atari game environment 
  	--steps=10(default) # set PGD attack steps number.

example: python PGD_generate_video.py, python PGD_generate_video[.py](http://train.py) —game=Assault

evalulation.py

test trained models for several episodes.

evalulation.py
	--config=config01.json(default)
	--seed=0(default)
  	--game=BeamRider(default) # set any atari game environment 
  	--iter=10(default) # set iteration number(tot episode number).

example: python evalulation.py, python evalulation[.py](http://train.py) —game=Assault —iter=30

pgd_evalulation.py

(+ PGD attack(adversarial perturbation on state observation)

test trained models for several episodes.

pgd_evalulation.py
	--config=config01.json(default)
	--seed=0(default)
  	--game=BeamRider(default) # set any atari game environment 
  	--iter=10(default) # set iteration number(tot episode number).