Coder Social home page Coder Social logo

gladisor / reinforcement-learning-applied-to-metamaterial-design Goto Github PK

View Code? Open in Web Editor NEW
16.0 3.0 8.0 298.12 MB

Using deep reinforcement learning to design a broadband acoustic cloak. Created under the supervision of PhD. Feruza Amirkulova and PhD Peter Gerstoft. With the help of: Linwei Zhou, Peter Lai, and Amaris De La Rosa.

Python 64.41% MATLAB 27.83% Jupyter Notebook 7.76%

reinforcement-learning-applied-to-metamaterial-design's Introduction

Reinforcement Learning Applied To Metamaterial Design

Our aim in this research is to use reinforcement learning to design a broadband acoustic cloak through inverse design. For more information on the project you can view our presentations:

Demo

These are examples of episodes of the trained DDPG (left) and DDQN (right) algorithms controling the positioning of 4 cylinder from a random configuration to one which produces low TSCS. Both algorithms were trained for 8000 episodes to minimize the root mean square (RMS) of TSCS across a range of wavenumbers from 0.45-0.35 ka.

Example usage

from tscsRL.environments.TSCSEnv import ContinuousTSCSEnv
from tscsRL.agents import ddpg

env = ContinuousTSCSEnv(
	nCyl=2,
	kMax=0.45,
	kMin=0.35,
	nFreq=11,
	stepSize=0.5)

params = ddpg.default_params()
params['save_every'] = 100
params['decay_timesteps'] = 100
params['num_episodes'] = 120
params['noise_scale'] = 1.1
params['save_data'] = False
params['use_wandb'] = True

name = 'test_ddpg'

agent = ddpg.DDPGAgent(
	env.observation_space, 
	env.action_space, 
	params, 
	name)

agent.learn(env)	

Diagrams of training loops

DDPG:

DDQN:

Credits

Images: Linwei Zhou

Inspiration for structuring agents: Ray

reinforcement-learning-applied-to-metamaterial-design's People

Contributors

amarisdlr avatar diulamax avatar gladisor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

reinforcement-learning-applied-to-metamaterial-design's Issues

Create a better way of applying actions

Currently the way we apply an action to a configuration is to simply add the action vector to the coordinates of the current configuration. If the resulting configuration is invalid (overlapping cylinders or cylinders beyond walls) then we reject the next configuration and move back to the original one as well as give a negative reward. This current system probably causes the agent to not see as many states since every illegal move causes the environment to revert back to the same state it was in before the move. We need a way to apply partial actions to the environment in a consistent and time efficient manner.

Test Cases:

  • New step function allows partial actions

Design a better reward function which works for all wavenumber ranges

Try creating a better reward function which is universal for all wavenumber ranges. So far I have been getting ok results for simple reward functions but maybe there is a better solution.

You can modify the reward function (getReward) in the env.py file in the DDPG folder.

helpful vids:
https://www.youtube.com/watch?v=0R3PnJEisqk&t=4s
https://www.youtube.com/watch?v=PYylPRX6z4Q&t=38s

Test cases

  • new reward function improves lowest rms tscs discovered (< 0.45)

Create and test policy gradient

Currently with DDQN using discrete actions and a step size of 0.5 the lowest scattering the agent can find is ~0.45. This is not low enough. With a continuous action space we may get better results.

Exploration with parameter noise.

  • Create a new branch to work on this issue.

_Currently the way our DDPG explores is the actor generates an action which would be represented by an 8 by 1 vector and then noise sampled from a normal distribution with a mean of 0 and scale of epsilon is added to it. This is an ok way to explore but perhaps there is a better way. I found a paper by openAI which uses parameter noise in order to explore. https://openai.com/blog/better-exploration-with-parameter-noise/

Read this paper and implement it on our DDPG by adding new noisy neural networks to the models.py file._

Implement and test Ornstein Uhlenbeck noise

Test cases:

  • Parameter noise implemented in neural networks.

  • Run experiments to see if it helps.

Create a new way to train the network

Currently the way we apply an action to a configuration is to simply add the action vector to the coordinates of the current configuration. If the resulting configuration is invalid (overlapping cylinders or cylinders beyond walls) then we reject the next configuration and move back to the original one as well as give a negative reward. This current system probably causes the agent to not see as many states since every illegal move causes the environment to revert back to the same state it was in before the move.

To solve this issue, we can first train the agent to learn how to output a valid actions. Instead of come back to the original state when an invalid action was given, we can execute the invalid action and give the agent a penalty. This way the agent can learn from its own mistake.

After the training, we can transfer the weight to the new agent. Such way the numbers of invalid actions are minimized. We can use this new agent to speed up our training time and eliminate the exploration problem.

Improve on old RL code

This is the first code written for this RL project using a static dataset. Somehow it preforms better than the methods we have now. It is simply a critic network which learns the resulting change in mean TSCS caused by an action. Optimized configurations are discovered by selecting random actions from a starting configuration and choosing the one which suppresses the scattering the most.

code is located in the src/Tristan_Shah/PyTorch_Projects/cleanImplementation.ipynb
https://drive.google.com/drive/folders/17lw1r6YJOb0TpFJqJ51uzfq-hK81HLB-?usp=sharing

Test cases:

  • Move code into repository

  • Replicate results achieved earlier (< 0.3 rms tscs)

  • Improve on the results (maybe use CNN)

Convert environments to gym Env

Current behavior is that states are passed to Actor network which generates an action and scales it to a specified range. To simplify the code and add the ability to have different scales of actions we need to use gym env action spaces.

Test cases:

  • Change environments to gym.Env
  • Scale actions to action range specified in gym action space
  • Test ddpg training cycle to see if it converges

Use convolutional networks to process image data in addition to the standard information.

  • Create new branch to work on this feature.

  • In models.py in the DDPG folder you can create two more models (ImageActor, ImageCritic) which are able to process images in addition to the standard data passed.

  • Create a new ImageDDPG object which inherits from DDPG and overrides any methods you need to change.

Notes:

  • We already have a function in env.py which produces an image from a configuration of cylinders. Call env.getImage(env.config) to produce image.

  • You will also need to modify the way we store data, I suggest adding two additional fields to the namedtuple on line 51 in ddpg.py.

namedtuple('Transition',('s', 'img','a','r','s_', 'nextImage', 'done'))

Test cases:

  • We can specify the architecture of the Image nets through their constructors.
  • There is a file which we can run to initiate training which uses a ImageDDPG agent which uses images.
  • Run several experiments and see if it helps.

Parallelize code so we can run multiple environments at the same time.

  • Create a new branch to work on this issue.

Currently the code operates as one agent interacting with one environment. This slows the training down significantly. If we have multiple environments generating data at asynchronously then update the agents at each learning step we will speed up train time.

Note: Ray seems like a good library to use for this task.
https://github.com/ray-project/ray

Ray tutorial
https://www.youtube.com/watch?v=q_aTbb7XeL4

Test cases:

  • Able to control number of workers

  • See increased gpu utilization as a result of parallelization.

Save data from runs to database

Create some code which allows us to save (state, action, reward, next_state, done) tuples to database of some kind. Also include ability to save images.

Try using some RL techniques to test different hyperparameters on this data.

Test cases

  • able to save data

  • able to easily read in data to python

  • run experiments on data to determine which hyperparameters are the best. Specifically number of hidden layers, neurons per layer, gamma, optimizer type.

Transfer to Matlab

Transfer the code to MATLAB, so we can use MATLAB's parallel computation toolbox.

Create a new branch and a new folder which contains the Matlab code

Test Cases:

  • Successfully implement our python environment as a matlab RL environment

  • Successfully train a matlab RL agent on environment

Increase number of cylinders using Multi-Agent DDPG

Attempting to increase the number of cylinders in the environment with a single agent shows no sign of convergence. Perhaps this is because the problem becomes much more complex when we increase the number of design parameters.

If we expand the number of agents maybe the problem will become simple enough for each agent to solve.

Here is a link to a paper on Multi-Agent DDPG: https://arxiv.org/pdf/1706.02275.pdf

https://towardsdatascience.com/openais-multi-agent-deep-deterministic-policy-gradients-maddpg-9d2dad34c82

Test cases:

  • Implement multi agent environment

  • Implement multi agent DDPG actors

  • Implement centralized DDPG critic

  • Show an improvement in performance to single agent DDPG

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.