This is an issue we encountered while trying to duplicate multiple ML1
environments per worker, and hopefully someone can help us resolve this bug because itβs a blocker with our codebase.
In a meta-learning setup, each meta-batch makes use of several workers in parallel, each of which rolls out episodes from a sampled task (=setting of environment parameters). In our codebase, we use pickle
to serialize environments for each worker (in order to ensure that environment/task parameters are constant for a worker).
set_task
in meta-world
takes a task index and obtains the goal via indexing into self.discrete_goals
. After some debugging, it turns out that after pickling environments, self.discrete_goals
, which is a list of 50 goal positions, is different from the value from before pickling. This is with self.random_init=False
.
We are wondering if there is a recommended way to make self.discrete_goals
deterministic before and after pickling an ML1
environment. (Relatedly, we would benefit from clarification on issue #24, which details what constitutes a task in ML1
.) Your help is greatly appreciated!
As a working example:
env = ML1.get_train_tasks('pick-place-v1')
envs_list.append(env)
env_pickle = pickle.dumps(env)
while len(envs_list) < self.num_envs_per_worker:
envs_list.append(pickle.loads(env_pickle))
print(envs_list[0].active_env.discrete_goals)
print(envs_list[1].active_env.discrete_goals)
Env[0] Discrete Goals: [array([0.05635804, 0.8268249 , 0.26080596], dtype=float32), array([-0.08220328, 0.8992955 , 0.27001566], dtype=float32), array([0.08398727, 0.8188896 , 0.05937913], dtype=float32), array([-0.03422436, 0.82531315, 0.08296145], ... ]
Env[1] Discrete Goals: [array([0.04696276, 0.8596079 , 0.12688547], dtype=float32), array([-0.05456738, 0.8163504 , 0.24694112], dtype=float32), array([-0.09329244, 0.85606927, 0.22053242], dtype=float32), array([-0.00348601, 0.81342274, 0.28464478], dtype=float32), ... ]