Coder Social home page Coder Social logo

robust-deep-rl_soft-actor-critic-approach's Introduction

Robust Deep RL with Soft Actor Critic approach

Robust Deep RL with a Soft Actor-Critic approach with adversarial perturbation on state observations

I designed new Robust Deep RL with a Soft Actor-Critic approach with adversarial perturbation on state observations. My work is based on SA-MDP, which is proposed by Zhang et al. (2020). For more detailed explanation, please check attached pdf file. **2022 Spring Semester, Personal Project Research _Kyungphil Park

SA-MDP(State Adversarial-MDP)

SA-MDP assumes that the fixed-adversarial attack is the situation of the worst-case with the most minimized Q value following equations, and Zhang et al. (2020) newly define it as a SA-MDP. **Zhang et al. (2020)

1

SA-SAC Regularizer

3

SA-SAC

In our work, we need to solve a minimax problem: minimizing the policy loss for a worst case

  • object function

4

Codes

I designed Robust Deep RL with a soft actor critic approach in discrete action space. I tested SA-SAC in a several atari gym environments. SAC codes are based on the **bernomone's github codes.

Train SA-SAC agent

At first, make new three directories saved_models, vidoes and Logs.

  • Before you start training, set n_steps, memory_size, train_start, reg_train_start … at the config01.json file.
  • n_steps : total nubmer of steps you want to train.
  • memory_size: buffer memory size
  • train_start: number of steps when training begins.
  • reg_train_start: number of steps when training with SA-Regularizer begins.

train.py (train vanilla SAC)

train.py 
	--config=config01.json(default)
	--new=1(default) # set 0 when you load pretrained models
 	--game=BeamRider(default) # set any atari game environment 
  • example: python train.py , python [train.py](http://train.py) —game=Assault

robust_train.py (train SA-SAC)

robust_train.py 
	--config=config01.json(default)
	--new=1(default) # set 0 when you load pretrained models
 	--game=BeamRider(default) # set any atari game environment 
  • example: python robust_train.py , python robust_[train.py](http://train.py) —game=Assault

generate_match_video.py

  • render atari game video with your trained models.
generate_match_video.py
	--config=config01.json(default)
	--seed=0(default)
  	--game=BeamRider(default) # set any atari game environment 
  	--random=False(default) # set 1 when you want to test random action.
  • example: python generate_match_video.py, python generate_match_video[.py](http://train.py) —game=Assault --random=1

PGD_generate_video.py

(+ PGD attack(adversarial perturbation on state observation)

  • render atari game video with your trained models
PGD_generate_video.py
	--config=config01.json(default)
	--seed=0(default)
	--game=BeamRider(default) # set any atari game environment 
  	--steps=10(default) # set PGD attack steps number.
  • example: python PGD_generate_video.py, python PGD_generate_video[.py](http://train.py) —game=Assault

evalulation.py

  • test trained models for several episodes.
evalulation.py
	--config=config01.json(default)
	--seed=0(default)
  	--game=BeamRider(default) # set any atari game environment 
  	--iter=10(default) # set iteration number(tot episode number).
  • example: python evalulation.py, python evalulation[.py](http://train.py) —game=Assault —iter=30

pgd_evalulation.py

(+ PGD attack(adversarial perturbation on state observation)

  • test trained models for several episodes.
pgd_evalulation.py
	--config=config01.json(default)
	--seed=0(default)
  	--game=BeamRider(default) # set any atari game environment 
  	--iter=10(default) # set iteration number(tot episode number).
  • example: python pgd_evalulation.py, python pgd_evalulation[.py](http://train.py) —game=Assault —iter=30

Results

Untitled 1 Untitled

References

[1] Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations, Zhang et al. (2020)

[2] Discrete Soft Actor Critic, bernomone's github codes

robust-deep-rl_soft-actor-critic-approach's People

Contributors

kyungphildev avatar

Stargazers

 avatar

Watchers

 avatar

robust-deep-rl_soft-actor-critic-approach's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.