Coder Social home page Coder Social logo

sisl / mpopis Goto Github PK

View Code? Open in Web Editor NEW
64.0 2.0 9.0 49.4 MB

Adaptive importance sampling modification to MPPI

License: MIT License

Julia 98.65% Python 1.35%
nonlinear-control model-predictive-control mppi optimal-control mpc sampling-based-planning sampling-based-control

mpopis's Introduction

MPOPIS (Model Predictive Optimized Path Integral Strategies)

Short YouTube video talking about MPOPIS

arXiv Paper

A version of model predictive path integral control (MPPI) that allows for the implementation of adaptive importance sampling (AIS) algorithms into the original importance sampling step. Model predictive optimized path integral control (MPOPI) is more sample efficient than MPPI achieving better performance with fewer samples. A video of MPPI and MPOPI controlling 3 cars side by side for comparison can be seen here. More details can be found in the wiki.

The addition of AIS enables the algorithm to use a better set of samples for the calculation of the control. A depiction of how the samples evolve over iterations can be seen in the following gif.

MPOPI (CE) 150 Samples, 10 Iterations

Policy Options

Versions of MPPI and MPOPI implemented

  • MPPI and GMPPI
    • MPPI (:mppi): Model Predictive Path Integral Control12
    • GMPPI (:gmppi): generalized version of MPPI, treating the control sequence as one control vector with a combined covariance matrix
  • MPOPI
    • i-MPPI (:imppi): iterative version of MPOPI similar to μ-AIS but without the decoupled inverse temperature parameter. μ-AIS is equivalent to IMPPI when λ_ais = λ.
    • PMC (:pmcmppi): population Monte Carlo algorithm with one distribution3
    • μ-AIS (:μaismppi): mean only moment matching AIS algorithm
    • μΣ-AIS (:μΣaismppi): mean and covariance moment matching AIS algorithm similar to Mixture-PMC4
    • CE (:cemppi): cross-entropy method56
    • CMA (:cmamppi): covariance matrix adaptation evolutionary strategy57

For implementation details reference the source code. For simulation parameters used, reference the wiki.

Getting Started

Use the julia package manager to add the MPOPIS module:

] add https://github.com/sisl/MPOPIS
using MPOPIS

If you want to use the MuJoCo environments, ensure you have envpool installed in your PyCall distribution:

install_mujoco_requirements()

Now, we can use the built-in example to simulate the MountainCar environment:

simulate_mountaincar(policy_type=:cemppi, num_trials=5)

Simulate the Car Racing environment and save a gif:

simulate_car_racing(save_gif=true)

Also plotting the trajectories and simulating multiple cars

simulate_car_racing(num_cars=3, plot_traj=true, save_gif=true)

Run a MuJoCo environment:

simulate_envpool_env(
    "HalfCheetah-v4";
    frame_skip = 5,
    num_trials = 2,
    policy_type = :cemppi,
    num_steps = 50,
    num_samples = 100,
    ais_its = 5,
    λ = 1.0,
    ce_Σ_est = :ss,
    seed = 1,
    output_acts_file = true,
)

The output should be something similar to:

Env Name:                     HalfCheetah-v4
Num Trails:                   2
Num Steps:                    50
Policy Type:                  cemppi
Num samples                   100
Horizon                       50
λ (inverse temp):             1.00
α (control cost param):       1.00
# AIS Iterations:             5
CE Elite Threshold:           0.80
CE Σ Est Method:              ss
U₀                            [0.0000, ..., 0.0000]
Σ                             0.2500 0.2500 0.2500 0.2500 0.2500 0.2500 
Seed:                         1

Trial    #:       Reward :   Steps:  Reward/Step : Ex Time
Trial    1:       115.46 :      50:         2.31 :   19.55
Trial    2:       126.08 :      50:         2.52 :   19.53
-----------------------------------
Trials AVE:       120.77 :   50.00:         2.42 :   19.54
Trials STD:         7.51 :    0.00:         0.15 :    0.02
Trials MED:       120.77 :   50.00:         2.42 :   19.54
Trials L95:       115.46 :   50.00:         2.31 :   19.53
Trials U95:       126.08 :   50.00:         2.52 :   19.55
Trials MIN:       115.46 :   50.00:         2.31 :   19.53
Trials MAX:       126.08 :   50.00:         2.52 :   19.55

The output_acts_file option, outputs a csv with the actions for the given environment. If you have the required python libraries installed (i.e. gym, numpy, imageio, and argparse), you can use the provided python script to generate a gif. By default, the simulate_envpool_env function outputs the action csv into the ./acts directory. The parameters to make_mujoco_gif.py are

  • -env: environment name (e.g. 'Ant-v4')
  • -af: action csv file
  • -o: output gif file name without the extension (e.g. 'output_fname')

Using one of the above action files:

python ./src/envs/make_mujoco_gif.py -env HalfCheetah-v4 -af ./acts/HalfCheetah-v4_5_cemppi_50_2_1_50_1.0_1.0_0.0_0.25_100_5_0.8_sstrial-2.csv -o HalfCheetah-v4_output_gif

Citation

@inproceedings{Asmar2023},
title = {Model Predictive Optimized Path Integral Strategies},
author = {Dylan M. Asmar and Rasalu Senanayake and Shawn Manuel and Mykel J. Kochenderfer},
booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
year = {2023}

Questions

In the paper, during the control cost computation (Algorithm 1, line 9) the noise is sampled from Σ′, but the given Σ is utilized for the inversion, is this a typo?

The control cost is computed using the covariance matrix Σ. As we adjust our distribution, we calculate the control cost based on the original distribution and account for the appropriate change in control amount as we change our proposal distribution.

The algorithm does not use the updated covariance from one MPC iteration to the next, why is this the case?

Using the updated covariance matrix in subsequent iterations of the MPC algorithm could result in faster convergence and during the next AIS iterations. However, it would likely decrease robustness (without robustness considerations in the AIS step). We considered showing results that used the updated covariance matrix, but wanted to focus on the core contributions of the paper and left that for future work.

In the algorithm, the trajectory costs utilized to update the control parameters are not centered or normalized, is this intentional?

This was intentional to align with previous versions of MPPI in the literature. There are other approaches that adjust the costs for numerical stability and to ease tuning across different environments. We do not anticipate a major change in performance if a step to adjust the costs is added.

How does this compare to a version where MPPI is allowed to run a few iterations before using the control output? This approach is similar to previous work8 and other versions of MPC approaches.

An iterative version of MPPI is similar the approach we take in the paper. The main differences are the decoupling of the inverse temperature parameter and the ability to sample from a joint distribution versus separate distributions at each control step. The performance of μ-AIS is similar to the iterative version and outperformed a pure iterative MPPI version in our experiments.

References

Footnotes

  1. G. Williams, N. Wagener, B. Goldfain, P. Drews, J. M. Rehg, B. Boots, and E. A. Theodorou. Information theoretic MPC for model-based reinforcement learning. Proceedings - IEEE International Conference on Robotics and Automation, 2017.

  2. G. R. Williams. Model predictive path integral control: Theoretical foundations and applications to autonomous driving. PhD thesis, Georgia Institute of Technology, 2019.

  3. O. Capp´e, A. Guillin, J. M. Marin, and C. P. Robert. Population Monte Carlo. Journal of Computational and Graphical Statistics, 13:907–929, 2004.

  4. O. Capp´e, R. Douc, A. Guillin, J. M. Marin, and C. P. Robert. Adaptive importance sampling in general mixture classes. Statistics and Computing, 18, 2008.

  5. M. J. Kochenderfer and T. A. Wheeler. Algorithms for Optimization. MIT Press, 2019. 2

  6. R. Y. Rubinstein and D. P. Kroese. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning. Vol. 133. New York: Springer, 2004.

  7. Y. El-Laham, V. Elvira, and M. F. Bugallo. Robust covariance adaptation in adaptive importance sampling. IEEE Signal Processing Letters, 25, 2018.

  8. J. Pravitra, E. A. Theodorou, and E. N. Johnson, “Flying complex maneuvers with model predictive path integral control,” in AIAA SciTech Forum, 2021.

mpopis's People

Contributors

dylan-asmar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mpopis's Issues

No CI

Missing any CI integration including tests and a compact helper.

Question about the definition of reward()

Hello, I have a question about the reward() function here:

trajectory_cost = trajectory_cost - reward(env) + control_costs

I can only find that it's in CommonRLInterface, is there any implementation of reward() in this MPPI code base? And how is the reward() function here related to the pseudocode in the paper?
Thank you!

A question about the mean value of the iterative sampling

Hello!

I have another question about the code here:

pol.U = pol.U + vec(mean(elite, dims=2))

In the code above, the mean value of the sampling is added to pol.U in every iteration, and in the following code, the difference between pol.U and its original value is added to the final sampling result E(LHS):
E = E .+ (pol.U - U_orig)

I think what it means is, the mean value of the final result of the sampling E(LHS) is the sum of the mean value in every iteration.

Can this cause the predictive trajectory calculated from E biases from the sampling in every iteration? Is it reasonable to just use the mean value of the elite trajectory in the last iteration as the mean value of the final sampling result instead of adding all those mean values up?

Thank you for your patience!

`Space` is no longer exported by ReinforcementLearningBase from v0.10.0+

[ Info: Precompiling MPOPIS [e8a75bc8-90e1-4072-945a-20230e5738f6]
ERROR: LoadError: UndefVarError: `Space` not defined
Stacktrace:
 [1] top-level scope
   @ ~/.julia/packages/MPOPIS/sD3Z4/src/envs/car_racing.jl:28
 [2] include(mod::Module, _path::String)
   @ Base ./Base.jl:457
 [3] include(x::String)
   @ MPOPIS ~/.julia/packages/MPOPIS/sD3Z4/src/MPOPIS.jl:2
 [4] top-level scope
   @ ~/.julia/packages/MPOPIS/sD3Z4/src/MPOPIS.jl:67
 [5] include
   @ ./Base.jl:457 [inlined]
 [6] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::String)
   @ Base ./loading.jl:2045
 [7] top-level scope
   @ stdin:3

No Compat entries

No compat entires in the Project.toml causing issues with compatibility as dependencies are updated.

Package precompile failed

After I installed the package, it shows that the precompile of Conda, Pycall and MPOPIS failed, then I precompiled it again and got this:
precompile_fail
To solve the problem, I first reinstalled Conda and PyCall:
image
And then reinstalled MPOPIS. PyCall and Conda worked this time, but MPOPIS still failed the precompile:
image
Is there anything wrong? Did I forget anything in the installation?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.