farama-foundation / metaworld Goto Github PK

View Code? Open in Web Editor NEW

1.1K 30.0 260.0 93.7 MB

Collections of robotics environments geared towards benchmarking multi-task and meta reinforcement learning

Home Page: https://meta-world.github.io/

License: MIT License

Dockerfile 0.08% Shell 0.05% Python 97.89% Jupyter Notebook 1.98%

meta-rl multi-task benchmark-environments mujoco

metaworld's People

Contributors

Stargazers

Watchers

Forkers

mahimanzum fuxianh mishalaskin dragomirradev mschachter wwxfromtju chenxingqiang smith6036 chaohuang-ch hyzcn nagisazj jlqzzz alpslee watchernyu anair13 amrmkayid ji4chenli clinbeckett andreykurenkov upasana23 xiaxx244 kukuxia suryachinnu5 guitaricet weihua916 rchalyang sharathraparthy hyyh28 michaelzhiluo ysulsky idurugkar illidanlab amimem gcbbobo prograguo vivian0108 bryanoliveira yus-nas linzichuan ashwinreddy dscho1234 ugurkanates dylanturpin eparisotto kostis-s-z adibellathur rmrafailov bhairavmehta95 odellus pkol bernwang takuyahiraoka simon0xzx chzfeng shushman huangjiancong1 hartikainen rondorf halesmith neo-x buoyancy99 chenso121 averma-rice shenqianli yudixie haoyu-x kschmeckpeper swipswaps syc7446 mhauskn zchuning fatiepie huanghanchi tienhoangvan mcx jorge-a-mendez evdcush abdulhaim mihdalal 911091933 cbfinn irenebosque usaywook yaoxt3 r-ceph cosmoshua mcgillmrl melfm towneszhou moojink laynewong haichao-zhang hzyjerry artemzholus vitchyr ma-env adityabingi triball3 cloudenginehub zachyao

metaworld's Issues

All metaworld environments report done=True at max time steps.

Although this can be convenient for on-policy algorithms, it technically makes the environment partially observable. In particular, this is known to significantly increase the sample requirements for off-policy algorithms in many cases.

Since the goal of this benchmark is to compare different meta-learning methods, it would be best to have a version of this benchmark that does not unfairly discriminate against off-policy meta-RL methods (such as PEARL).

Change max_path_length to 200 for all envs

Mujoco 2.0.0

Does metaworld require mujoco 2.0.0? Or is there a way I can get it to work with the 1.5.x versions?

Evaluation Protocol

Hello,

I have some questions about the evaluation presented in the paper. Although this is not the best place for questions related to the paper, I think it could help to reproduce some experiments.

How many samples are used to train individual policies to reproduce Figure 6 ("Single Task Success Rate")? If this is different for each task, could you provide ate least an average number of samples?
Could you please provide the number of samples used to train the policies in Figure 7 and 8, in order to obtain the presented asymptotic performance?
Do you have the code used to those evaluations released? If so, could you provide them? I am referring to the methods themselves, such as SAC/PPO and MAML/PEARL/RL2.

Thank you so much for the support and for providing us this benchmark!

Remote rendering issue with env.render(mode='rgb_array') and env.get_image()

This is an issue I typically have with Mujoco-based simulations when running remotely. Basically rendering is problematic, even in rgb_array mode where we want to access the image frames.

Is there a workaround for this issue? The OpenAI gym envs have the same issue and I only know DMSuite somehow bypasses this - where I can render in rgb_array mode.

So the errors you would get are the followings :

env.render(mode='rgb_array')

GLFW error (code %d): %s 65542 b'EGL: Failed to get EGL display: Success'
Creating window glfw
X Error of failed request:  255
  Major opcode of failed request:  155 (GLX)
  Minor opcode of failed request:  5 (X_GLXMakeCurrent)
  Serial number of failed request:  136
  Current serial number in output stream:  137

env.get_image(width=84, height=84)

ERROR: GLEW initalization error: Missing GL version

Possible BUG report

Hi all, thanks for this amazing repo for meta RL and it's really helpful 👍. However, I have encountered some problems recently, and it seems there have few bugs in the codes.

I'm trying to modify the goal of tasks sampled from pick-place-v1, but it fails. And I try to look at the source codes to fix this problem:

In sawyer_xyz/base.py, the set_goal_() method deals with the attribute self.goal, while in sawyer_pick_and_place.py, it seems the task computes rewards based on self._state_goal.

I'm so confused about this. Are they the same? Or it's just a bug of inconsistent names? I hope someone can help me with this. Thanks a lot!

BTW, it would be much better if there is a documentation about the repo in the future 😊!

Derek

ML10/45 Goal Variation

Hello,

First, thank you for providing us this benchmark and for helping us to understand its details.

I have two questions about ML10/45 and paper results.

So, in these benchmarks, is there goal variation across episodes?
As an example, lets say I sample reach-v1 task in ML10. In the official benchmark, should I expect to have a constant goal during all episodes OR the goal can change after each reset?
In the paper, specifically in Figure 6 ("Single Task Success Rate"), is the goal constant for each task? Or is there goal variation during learning as well?

Thanks!

Gym versions

Is there a reason gym version is pinned to precisely 0.12.1?

In general, what gym versions are compatible?

Regarding the done signal at max_path_length

Hello!

I read #45 and I understand why you decided to remove it, so sorry for referencing again this issue. However, I believe that as @krzentner said, there are many cases that it is helpful to have this functionality.

I also understand the decision that whoever wants to change this, will need to implement it themselves. But if there is a lot of interest of having this functionality, would it be possible to add this as an option when creating the environment?

In the same style that there are the arguments sample_all, sample_goals etc, there could be an argument that would toggle between sending done=True when reaching max_path_length or not. Personally this is not an issue as I have already implement this functionality in my code as you suggest so it would be more of a future enhancement if there is a demand for it.

Would love to hear your thoughts on it!

How to run unit tests?

I have metaworld installed locally with anaconda3 - how are the unit tests typically run from the command line?

Some questions regarding to the paper

First thank you for the awesome simulation environment! And here are some of my questions:

Can I know the your computational resources to run the experiments, and the corresponding running time for each of the tasks.
For the bottom three tasks of Figure 6, i.e., "push with wall", "disassemble nut" and "turn dial", the success rates are low. Are these tasks too hard to solve?
Can I know the standard definition of "structrual similarities"? In order word, how do you measure the similarities between tasks?
Can you elaborate the reason why multi-head SAC performs so much better that SAC with single head?
It is surprising that PPO performs so much worse that TRPO in MT50, which seems to be contradict with the resuls from model-free RL. Do you have a good explaination for this?
The results in Table 1 and Figure 8 seem to be inconsistent. For example, in ML45, Table 1 says PEARL performs the best on meta-test. However, it is the worst according to Figure 8.
Can you provide some evidence for the claims "special care needs to be taken to ensure that the task cannot be inferred directly from the image, else meta-learning algorithms will memorize the training tasks rather than learning to adapt"?

Some question about metaworld environment.

I'm interested in your MetaWorld.
It will be good benchmark about Meta-RL.
I have some questions about this benchmark.

When start at environment, sawyer arm move to specific pose during first K steps.(K is about 10)
It seems like 'init' function called by mujoco.
I think this will cause problems when agent do reinforcement learning.
Is this intended?
Reward gap is big between some environment.
When reach env get almost 100 at first time, push env get small reward(between 0 and 1).
Is this intended?

Custom Camera Views

Hello,

Thank you for the excellent code.

I'm trying to use the setup to collect simulated RGB data for a project but I haven't been able to find a way to set custom camera angle while rendering. All I can see is some predefined modes like topview. Anyway I can setup custom camera views for the tasks?

Thanks!

Scripted Policies for ML10/MT10 Environments

ML10/MT10 environments:

Vectorizing Envs over Many Workers Results in Memory Overflow

Currently, I'm using RLlib for running metaworld envs, where each worker runs many vectorized instances of the environment.

Tried running MAML/ProMP with 40 workers and 20 envs/worker on one of the Metaenvs (Push). Rllib can train on this for couple iterations before crashing due to memory overflow (can't allocate more memory). Not sure on what exactly is the issue, but do you have some leads on what could be the issue? I was thinking that it might be a memory leak, but trying this on a lower # of workers resulted in worse training overall but most importantly no crashing.

Missing Environments

If you try running scripts/demo_sawyer.py, many of the imports don't work because of missing environment such as
from metaworld.envs.mujoco.sawyer_xyz.sawyer_stack import SawyerStackEnv

Setting environment to a reference environment's state

Here is a particular gym registered version of a metaworld environment that I am using

register(
    id='MW-Reach-v0',
    entry_point='metaworld.envs.mujoco.sawyer_xyz.sawyer_reach_push_pick_place:SawyerReachPushPickPlaceEnv',
    kwargs=dict(task_type='reach', random_init=False)
)

In words, I'm trying to run CEM using ground truth dynamics for the reach task using fixed goal. The way I do this is by creating n copies of the environment and stepping on the copies. To make sure each copy starts at the current observation, I use the following code:

env = gym.make('MW-Reach-v0')
# ... rollout env to some state using standard `.step()`
copies = [gym.make('MW-Reach-v0') for _ in range(n)]
cur_state = env.get_env_state()
for i in range(n):
    copies[i].set_env_state(cur_state)

It does reach the success state every time. Is this the correct approach?

Further, now I would like to use random_init=True for randomized goals but the same approach doesn't work. Is it enough to just run .set_goal_(cur_env.goal) on each copy as below?

cur_state = env.get_env_state()
for i in range(n):
    copies[i].set_goal_(cur_env.goal)
    copies[i].set_env_state(cur_state)

It could potentially just be a hyperparameter issue, but just wanted to make sure if I have the interaction with environments correct. Thanks!

Inconsistency between GIF on github and implementation

https://github.com/rlworkgroup/metaworld/blob/b556580204bf704dabcd0b5e558850db7a4335c4/metaworld/envs/mujoco/env_dict.py#L158-L211

I'm currently working on a project and we are using ML45 GIF from here as a standard of our task suite. I've spotted some tasks in the GIF that are not in the dictionary above. Our goal is to include stack, unstack and open_box, which appear in the GIF, to our task suite.

I also found these environments in the dictionary that are not in the GIF: hand_insert, assembly_pug, pick_out_of_hole, shelf_remove, and sweep_tool. Additionally, shelf_remove and sweep_tool are imported in the file but they are not in the dictionary.

One last thing I would like to point out here is that environments pick_place and pick_place_wsg exist but are not used in the ML45 dictionary.

I'm just hoping that I don't miss the environments of interest in my task suite(stack, unstack, open_box)

Question regarding reward function and observation space

Hi!

I have a few questions about the reward function and observation spaces that I hope you can help with.

The reward function for some environments (e.g. reach) is not consistent in the paper and in the code. Should I modify the code so they are consistent, or I should just use the one in the code? The one in the code makes more sense for the reaching task.
What was the observation space used in the ML10 and ML45 evaluations? Earlier in the paper it mentioned that the observation is 9-dimensional consisting of hand position and two task-related positions. For ML1 it hides the goal position to be the task, so it has a 6-dimensional observation space. I'm wondering if ML10 and ML45 are the same (goal position hidden), or they can observe the true goal?

Thank you so much in advance!

Parallel Environments

Is there a straightforward way to run multiple environments in parallel or do you have to hack it using the wrapper from gym?

AttributeError: 'MjViewer' object has no attribute 'finish'

The rendering works fine until we call .close() on the environment.

Minimal code to reproduce the error

from metaworld.envs.mujoco.sawyer_xyz.sawyer_reach_push_pick_place import SawyerReachPushPickPlaceEnv

if __name__ == "__main__":
  env = SawyerReachPushPickPlaceEnv(task_type='push', random_init=True)

  obs = env.reset()
  done = False
  while not done:
    env.render()
    next_obs, reward, done, info = env.step(env.action_space.sample())

  env.close()

Stack trace

Creating window glfw
Traceback (most recent call last):
  File "try.py", line 13, in <module>
    env.close()
  File "/miniconda3/lib/python3.7/site-packages/metaworld/envs/mujoco/mujoco_env.py", line 135, in close
    self.viewer.finish()
AttributeError: 'MjViewer' object has no attribute 'finish'

Gym Version: 0.12.1
Mujoco-py Version: 2.0.2.8

ImageEnv compatibility with benchmark classes

Currently, the benchmark tasks on support state observations / goals. However, a lot of RL researchers would benefit from the ability to get image observations / goals. There's an ImageEnv available but it's incompatible with the benchmark classes ML1, etc. It works with single-task environments, but currently there's no goal-setting enabled there.

Would be useful either (a) to have image support for ML1, etc benchmarks or (b) enable goal setting for single task environments.

Is this currently possible? Maybe I missed something in the codebase.

Ensure max_path_length is 150 for all envs

Adding support for the Franka Panda robot

Hi,

Thanks a lot for this great open-source benchmark. Great work!

I was wondering are there any (hopefully short-term) plans to add support for the Franka Panda robot as part of the benchmark? This robot is becoming pretty common and standard in many libraries.

Thanks,
Gal

Never send termination signals

Remove useless codes

I noticed that there are more codes that need to be removed. For example this environment was not used in our benchmarking. Also, some scripts under the scripts folder are outdated.

We need to do another round of audit for this.

Static Task IDs for environments

For training scenarios where task one-hot ids are necessary for training, static task IDs are necessary across tasks for training and evaluation. An example of this is in Multitask RL algorithms where the inputs to the policy are task one-hot ids

Bug(or not) about drawer-close-v1 environment.

When I'm training my multi-task policy for Multi-Task 10 benchmark, I find that drawer-close-v1 task always succeeds at the first timestep of the episode.

I checked the implementation of the drawer-close and drawer-open environments, and I noticed that they use the same goal by default and they used the same condition to judge whether the task succeeded. I'm not sure whether this is correct, would you mind taking a look at it?

Feature request: meta-learning baselines

Hi! Thank you for your work!

It would be great to have some baselines that allow for a fast start with Metaworld (e.g. code used for the paper). Or at least to mention some repositories that use Metaworld.

Pre-trained policies for each env

I emailed this, but also seems appropriate to post here. We want to do some research that builds on top of your excellent code. I am guessing this gif figure was made using trained policies for each env. Any chance you could provide the trained policies for each individual env to aid with research that uses them? The code to generate gifs of a rollout for a would also be nice, if you can share it.

Benchmarks must be constructed before task names can be accessed

Currently in order to access all of the task names that belong to a benchmark's train/test set, the benchmark and all of its environments must first be constructed.

This is wasteful in the case that one needs to construct a benchmark's environments using the from task_api. The proper behavior should be that one should be able to access the names without constructing all of the underlying environments.

Update the arXiv paper with reward functions from the public code

Do you plan to publish the codes (MAML, RL^2, PEARL) to reproduce the paper?

It would be very helpful if you could make it public.

Delete gym registration

This is misleading (not how the API is meant to be used) and outdated.

Paper and results

Hi,

Thanks for open sourcing this.
I was wondering when the paper for Meta-World will be out? If the paper is not ready yet and will be released in the far future, is there any plan to release your evaluation results on these tasks for those 6 meta-rl and multi-task learning algorithms mentioned in the website?

Cannot use gym.make with external code without error

I am trying to get this package integrated with my own. I figured I could just add "from metaworld.envs.mujoco import *" at the top of a file and do gym.make to create a metaworld env with a particular name, since the init of that package registers all these envs, but that results in this error:

...
self.env = gym.make(self.config.robot.task.name)
File "/cvgl2/u/andreyk/projects/manipxonomy/env/lib/python3.5/site-packages/gym/envs/registration.py", line 183, in make
return registry.make(id, **kwargs)
File "/cvgl2/u/andreyk/projects/manipxonomy/env/lib/python3.5/site-packages/gym/envs/registration.py", line 125, in make
env = spec.make(kwargs)
File "/cvgl2/u/andreyk/projects/manipxonomy/env/lib/python3.5/site-packages/gym/envs/registration.py", line 89, in make
env = cls(_kwargs)
File "/cvgl2/u/andreyk/projects/manipxonomy/metaworld/metaworld/envs/mujoco/sawyer_xyz/sawyer_pick_and_place.py", line 43, in init
**kwargs
TypeError: init() got multiple values for keyword argument 'hand_high'

That actually makes total sense, since in registrations the kwargs are specified as such:

    register(
        id='SawyerPickupEnv-v0',
        entry_point='metaworld.envs.mujoco.sawyer_xyz'
                    '.sawyer_pick_and_place:SawyerPickAndPlaceEnv',
        tags={
            'git-commit-hash': '30f23f7',
            'author': 'steven',
        },
        kwargs=dict(
            hand_low=(-0.1, 0.55, 0.05),
            hand_high=(0.0, 0.65, 0.2),
            action_scale=0.02,
            hide_goal_markers=True,
            num_goals_presampled=1000,
        )
)

And then these parameters are also set in the constructor:

class SawyerPickAndPlaceEnv(SawyerXYZEnv):
    def __init__(
            self,
            obj_low=None,
            obj_high=None,
            random_init=False,
            tasks = [{'goal': np.array([0.1, 0.8, 0.2]),  'obj_init_pos':np.array([0, 0.6, 0.02]), 'obj_init_angle': 0.3}], 
            goal_low=None,
            goal_high=None,
            hand_init_pos = (0, 0.6, 0.2),
            liftThresh = 0.04,
            rewMode = 'orig',
            rotMode='rotz',#'fixed',
            **kwargs
    ):
        self.quick_init(locals())
        hand_low=(-0.5, 0.40, 0.05)
        hand_high=(0.5, 1, 0.5)
        obj_low=(-0.5, 0.40, 0.05)
        obj_high=(0.5, 1, 0.5)
        SawyerXYZEnv.__init__(
            self,
            frame_skip=5,
            action_scale=1./100,
            hand_low=hand_low,
            hand_high=hand_high,
            model_name=self.model_name,
            **kwargs
)

It appears the kwargs are just outdated, should they just be removed? Happy to submit a quick PR with that change, it seems to fix the gym.make.

Parametric variations for multi-task training (MT-x)

Hi all. Thanks for putting the benchmark together!

One comment however on multi-task training: why are the MT-x benchmarks constrained to fixed goal locations? Including parametric variations would increase the size of the training distribution and increase the potential for transfer across tasks. As is, the optimal policies for MT-10 and MT-50 would be a discrete mixture of 10 or 50 policies, which might preclude behavior sharing across tasks. If parametric variations were to be included, then goal positions (in addition to task ID) would be provided as input to the policy -- in contrast to the meta-learning benchmarks. Thanks for taking this into consideration!

Raise an exception if a user steps past max_path_length

Scripted policies for each environment

Object and goal consistency with random_init=False

Not all tasks seem compatible with the random_init=False mode of the ML1 benchmark. For example, some tasks require some degree of consistency between the object location and the goal. For example, in sawyer_drawer_close random_init=True first samples a random object location, and then configures the goal location as the same location with an offset of 0.2 (in one of the xyz coordinates). No such consistency is enforce when random_init=False and goals are configured by hand via random_init=False. See code example below.

from metaworld.benchmarks.ml1 import ML1

def diagnose(env):
  img = env.sim.render(height=480, width=640)
  plt.figure(); plt.imshow(img[::-1])
  print('Goal location:', env._state_goal)
  print('Object location: ', env.obj_init_pos)
  print('Reward:', reward)

env = ML1.get_train_tasks('drawer-close-v1', sample_all=True)
env.active_env.random_init = False
env.active_env.initialize_camera(top_down_camera)
task_a, task_b = env.sample_tasks(2)  

env.set_task(task_a)
obs = env.reset()  # Reset environment
diagnose(env.active_env)

env.set_task(task_b)
obs = env.reset()  # Reset environment
diagnose(env.active_env)

env.set_task(task_a)
obs = env.reset()  # Reset environment
diagnose(env.active_env)

which clearly shows the discrepancy between goal locations a/b and the static object:

Goal location: [0.2470843  0.7214825  0.38136098]
Object location:  [-0.08980742  0.89999998  0.04      ]
Reward: -0.11522688447982872
Goal location: [-0.14852686  0.57654315  0.45528063]
Object location:  [-0.08980742  0.89999998  0.04      ]
Reward: -0.11522688447982872
Goal location: [0.2470843  0.7214825  0.38136098]
Object location:  [-0.08980742  0.89999998  0.04      ]
Reward: -0.11522688447982872

Unstable simulation while using Metaworld

Hi, I am testing my algorithm on metaworld and I am frequently getting this exception

File "/home/lthpc/anaconda3/envs/MAML/lib/python3.6/site-packages/mujoco_py/builder.py", line 359, in user_warning_raise_exception
raise MujocoException('Got MuJoCo Warning: {}'.format(warn))
mujoco_py.builder.MujocoException: Got MuJoCo Warning: Nan, Inf or huge value in QACC at DOF 0. The simulation is unstable. Time = 0.1250.

I met this problem on ML1 'reaching' and 'pushing' envs, and my codes use the same policy structure as PEARL, so I think my codes give out legal actions.

I would greatly appreciate it if you can provide some insight about what's going wrong. Thank you.

All environments produce observations outside of observation space.

The following is a minimal working example which shows that all of the environments produce observations outside of their observation space. All it does is iterate over each environment from ML1, sample and set a task for the given environment, then take random actions in the environment and test whether or not the observations are inside the observation space, and at which indices (if any) an observation lies outside of the bounds of the observation space. You will get different results depending on the value of TIMESTEPS_PER_ENV, but setting this value to 1000 should yield violating observations for most environments. This is an issue, say, for RL implementations like RLlib which expect observations to be inside the observation space, and makes the environment incompatible with such libraries. This might be related to issue #31, though that issue only points out incorrect observation space boundaries regarding the goal coordinates, and the script below should point out that there are violations in other dimensions as well.

import numpy as np
from metaworld.benchmarks import ML1

TIMESTEPS_PER_ENV = 1000

def main():

    # Iterate over environment names.
    for env_name in ML1.available_tasks():

        # Create environment.
        env = ML1.get_train_tasks(env_name)
        tasks = env.sample_tasks(1)
        env.set_task(tasks[0])

        # Get boundaries of observation space and initial observation.
        low = env.observation_space.low
        high = env.observation_space.high
        obs = env.reset()

        # Create list of indices of observation space whose bounds are violated.
        broken_indices = []

        # Run environment.
        for _ in range(TIMESTEPS_PER_ENV):

            # Test if observation is outside observation space.
            if np.any(np.logical_or(obs < low, obs > high)):
                current_indices = np.argwhere(np.logical_or(obs < low, obs > high))
                current_indices = current_indices.reshape((-1,)).tolist()
                for current_index in current_indices:
                    if current_index not in broken_indices:
                        broken_indices.append(current_index)
    
            # Sample action and perform environment step.
            a = env.action_space.sample()
            obs, reward, done, info = env.step(a)

        # Print out which indices of observation space were violated.
        broken_indices = sorted(broken_indices)
        print("%s broken indices: %r" % (env_name, broken_indices))

if __name__ == "__main__":
    main()

Observations are incorrectly padded with zeros

in MultiClassMultiTaskEnv a use case that routinely has to be dealt with is when multiple environments have observation spaces that are of different sizes. In ML45/MT50 the observation space is (6,) and (9,) for some spaces. The observation spaces that are of shape (6,) are supposed to be padded with zeros in order to make the observation spaces that are size( 6,) size (9,).

The issue is that where that is currently done, if a static_task_id, or one hot id is appended to the observation, then the one hot id is appended to the observation before any zeros are padded onto the observation. This leads to the one-hot id being incorrectly shifted inside the returned observation depending on the observaton_space shape of the underlying environment.

Questions regarding the simulator

Hello,

Awesome work! I have some questions about how the meta world simulator takes care of certain cases.

How does meta-world simulate grasping? Is the object assumed to be grasped if near the arm and the grip action is taken? Is there any simulation for contact or friction while grasping? Some info about the assumptions made would be very useful.
As far as I understand, the action space is of size 4 (3 for the location of the arm and 1 for gripper). Am I correct in my observation? Does the orientation of the arm have no impact on the environment? Some info about the inverse kinematics used could be very useful here. Also, if the pose of the robotic arm matters, how can one control it?

Best regards,
Ankit

metaworld only works with gym 0.15.4

gym version 0.15.5 upwards have deprecated the tags arguments in environment registration

They have specified the change here

ML1 Tasks for Constant Goals

Currently, we are trying to use specific environments in ML1 to set a goal constant per task in a MAML-setting (with env.reset() meaning that initial positions change but goal stays constant)

However, we are not clear on what a task means in the ML1 setting. Based on the code for one of the environments we are trying to run, it seems like calling self.set_task will update self.goal. However, when the environment is reset, self._state_goal is initially self.goal but is then assigned a randomly generated goal + a concatenation of initial reacher arm positions, which also appears to be random. When self.random_init is False, it works as intended but the starting states are constant.

We wondering if there is a way to define a task using the metaworld API such that for a given task a goal position is held constant but initial observation changes when env.reset() is called.

can you manipulate the environment configurations?

Does the package provide access to change the environment configurations? For example, change the object shape or size, change environment dynamics parameters, or change the color or texture of objects?

Making Environments Serializable

This is an issue we encountered while trying to duplicate multiple ML1 environments per worker, and hopefully someone can help us resolve this bug because it’s a blocker with our codebase.

In a meta-learning setup, each meta-batch makes use of several workers in parallel, each of which rolls out episodes from a sampled task (=setting of environment parameters). In our codebase, we use pickle to serialize environments for each worker (in order to ensure that environment/task parameters are constant for a worker).

set_task in meta-world takes a task index and obtains the goal via indexing into self.discrete_goals. After some debugging, it turns out that after pickling environments, self.discrete_goals, which is a list of 50 goal positions, is different from the value from before pickling. This is with self.random_init=False.

We are wondering if there is a recommended way to make self.discrete_goals deterministic before and after pickling an ML1 environment. (Relatedly, we would benefit from clarification on issue #24, which details what constitutes a task in ML1.) Your help is greatly appreciated!

As a working example:

env = ML1.get_train_tasks('pick-place-v1')
envs_list.append(env) 
env_pickle = pickle.dumps(env) 
while len(envs_list) < self.num_envs_per_worker:
    envs_list.append(pickle.loads(env_pickle))
print(envs_list[0].active_env.discrete_goals)
print(envs_list[1].active_env.discrete_goals)

Env[0] Discrete Goals: [array([0.05635804, 0.8268249 , 0.26080596], dtype=float32), array([-0.08220328, 0.8992955 , 0.27001566], dtype=float32), array([0.08398727, 0.8188896 , 0.05937913], dtype=float32), array([-0.03422436, 0.82531315, 0.08296145], ... ]

Env[1] Discrete Goals: [array([0.04696276, 0.8596079 , 0.12688547], dtype=float32), array([-0.05456738, 0.8163504 , 0.24694112], dtype=float32), array([-0.09329244, 0.85606927, 0.22053242], dtype=float32), array([-0.00348601, 0.81342274, 0.28464478], dtype=float32), ... ]

A small mistake(or not) in the observation space for multi-class-multi-task environment.

It looks like that the observation-space definition of the multi-class-multi-task environment is kind of confusing (I am not sure whether it's a mistake or not).

For observation space with the obs_type is with_goal_id or with_goal_and_id, current code is useing goal_id_low for high in Box, and goal_id_high for low in Box.

elif self._obs_type == 'with_goal_id' and self._fully_discretized:
    goal_id_low = np.zeros(shape=(self._n_discrete_goals,))
    goal_id_high = np.ones(shape=(self._n_discrete_goals,))
    return Box(
        high=np.concatenate([plain_high, goal_id_low,]),
        low=np.concatenate([plain_low, goal_id_high,]))
elif self._obs_type == 'with_goal_and_id' and self._fully_discretized:
    goal_id_low = np.zeros(shape=(self._n_discrete_goals,))
    goal_id_high = np.ones(shape=(self._n_discrete_goals,))
    return Box(
        high=np.concatenate([plain_high, goal_id_low, goal_high]),
        low=np.concatenate([plain_low, goal_id_high, goal_low]))

farama-foundation / metaworld Goto Github PK

metaworld's People

Contributors

Stargazers

Watchers

Forkers

metaworld's Issues

Recommend Projects

Recommend Topics

Recommend Org