rte-france / l2rpn-baselines Goto Github PK

View Code? Open in Web Editor NEW

75.0 13.0 45.0 84.95 MB

L2RPN Baselines a repository to host baselines for l2rpn competitions.

Home Page: https://l2rpn-baselines.readthedocs.io/en/stable/

License: Mozilla Public License 2.0

Python 96.75% Shell 0.37% Makefile 0.09% Batchfile 0.10% PureBasic 2.68%

reinforcement-learning-algorithms powergrid-operation grid2op baseline

l2rpn-baselines's Introduction

L2RPN_Baselines

Repository hosting reference baselines for the L2RPN challenge

Install

Requirements

python3 >= 3.6

Instal from PyPI

pip3 install l2rpn_baselines

Install from source

git clone https://github.com/rte-france/l2rpn-baselines.git
cd l2rpn-baselines
pip3 install -U .
cd ..
rm -rf l2rpn-baselines

Contribute

We welcome contributions: see the contribute guide for details.

Get started with a baseline

Say you want to know how you compared with the "PPO_SB3" baseline implementation in this repository (for the sake of this example).

Train it (optional)

As no weights are provided for this baselines by default (yet), you will first need to train such a baseline:

import grid2op
from l2rpn_baselines.PPO_SB3 import train
env = grid2op.make("l2rpn_case14_sandbox")
res = train(env, save_path="THE/PATH/TO/SAVE/IT", iterations=100)

You can have more information about extra argument of the "train" function in the CONTRIBUTE file.

Evaluate it

Once trained, you can reload it and evaluate its performance with the provided "evaluate" function:

import grid2op
from l2rpn_baselines.PPO_SB3 import evaluate
env = grid2op.make("l2rpn_case14_sandbox")
res = evaluate(env, load_path="THE/PATH/TO/LOAD/IT", nb_episode=10)

You can have more information about extra argument of the "evaluate" function in the CONTRIBUTE file.

l2rpn-baselines's People

Contributors

Stargazers

Watchers

Forkers

yang-new nlebang akjayant mjothy qytreinforcementlearning jhalvors-school alnaeini ntudy wika3l jiadingfang falcondai tiendzung-le radioactive5111 kjtang94 windyasd ydlu colin-fox robertogemartin ikelq rrhossain halbupt waith shah-tapan-r scalaboy xiaohuihui214 alexzhaozt ferrandor andy19961017 yucheng9 dilyar-iskan muyun1996 aut6620 xiehao-ai ml-iee mingsuosu fmarten99 didayolo xjtu-rl rokey0001 jules-ljn lajavaness eboguslawski thomaslautenb

l2rpn-baselines's Issues

About the "DoubleDuelingDQN" baseline

Hi, when I try yo train the "DoubleDuelingDQN" baseline using the example provided in the README，I encounter the following error：（in the DoubleDuelingDQN.py file）
AttributeError: 'Environment_rte_case14_realistic' object has no attribute 'reward_helper'

I can't find the method “reward_helper” anywhere.The only method related to it is "_reward_helper".
Could you help me explain this? Thank you very much!

Rewrite the readme to use a recent baseline and not a deprecated one

Make the environments compatible with new gym interface

GymEnvWithReco, GymEnvWIthRecoWithDN etc.

Issue when evaluating trained PPO_RLLIB agent

System information

Grid2op version: 1.8.1
l2rpn-baselines version: 0.6.0.post1
System: mac osx, ubuntu16.04, ...
Baseline concerned: PPO_RLLIB

Bug description

When I am evaluating trained PPO_RLLIB agent the total score for chronics is getting printed out as 0. Even if the PPO_RLLIB agent didn't get trained properly, but total_score should still be non-zero.

Output I am getting is

Evaluation summary:
chronics at: 0000       total score: 0.000000   time steps: 1091/8064
chronics at: 0001       total score: 0.000000   time steps: 807/8064
chronics at: 0002       total score: 0.000000   time steps: 3001/8064
chronics at: 0003       total score: 0.000000   time steps: 3/8064
chronics at: 0004       total score: 0.000000   time steps: 804/8064
Evaluation summary for Do Nothing Agent:
chronics at: 0000       total score: 622.306925 time steps: 1091/8064
chronics at: 0001       total score: 464.387165 time steps: 807/8064
chronics at: 0002       total score: 1759.294096        time steps: 3001/8064
chronics at: 0003       total score: 1.020729   time steps: 3/8064
chronics at: 0004       total score: 479.332989 time steps: 804/8064

How to reproduce

The training script I used

import grid2op
from grid2op.gym_compat import GymEnv, BoxGymObsSpace, BoxGymActSpace
from grid2op.Backend import PandaPowerBackend
from lightsim2grid import LightSimBackend
from l2rpn_baselines.PPO_RLLIB import PPO_RLLIB, train
from l2rpn_baselines.PPO_RLLIB.rllibagent import RLLIBAgent
from grid2op.Reward import LinesCapacityReward  # or any other rewards
from grid2op.Chronics import MultifolderWithCache  # highly recommended
import copy
import re
import ray

env_name = "l2rpn_case14_sandbox"  # or any other name
obs_attr_to_keep = ["day_of_week", "hour_of_day", "minute_of_hour", "prod_p", "prod_v", "load_p", "load_q",
                    "actual_dispatch", "target_dispatch", "topo_vect", "time_before_cooldown_line",
                    "time_before_cooldown_sub", "rho", "timestep_overflow", "line_status",
                    "storage_power", "storage_charge"]
act_attr_to_keep = ["change_line_status", "change_bus", "redispatch"]

env = grid2op.make(env_name, backend=LightSimBackend())

ray.init()

train(env,
        iterations=100,  # any number of iterations you want
        learning_rate=1e-4, # set learning rate
        save_path="./saved_model/PPO_RLLIB3",  # where the NN weights will be saved
        # load_path="./saved_model/PPO_RLLIB/test", # resuming from previous saved training
        name="test",  # name of the baseline
        net_arch=[100, 100, 100],  # architecture of the NN
        save_every_xxx_steps=10,  # save the NN every 2 training steps
        env_kwargs={"reward_class": LinesCapacityReward,
                    "chronics_class": MultifolderWithCache,  # highly recommended
                    "data_feeding_kwargs": {
                        'filter_func': lambda x: re.match(".*00$", x) is not None  #use one over 100 chronics to train (for speed)
                        }
        },
        obs_attr_to_keep=copy.deepcopy(obs_attr_to_keep),
        act_attr_to_keep=copy.deepcopy(act_attr_to_keep),
        verbose=True)

env.close()
ray.shutdown()

The evaluation script used

import grid2op
from grid2op.Reward import LinesCapacityReward  # or any other rewards
from lightsim2grid import LightSimBackend  # highly recommended !
from l2rpn_baselines.PPO_RLLIB import evaluate
from grid2op.Runner import Runner

nb_episode = 5
nb_process = 1
verbose = True
env_name = "l2rpn_case14_sandbox"
env = grid2op.make(env_name,
                reward_class=LinesCapacityReward,
                backend=LightSimBackend()
                )
try:
    evaluate(env,
            nb_episode=nb_episode,
            load_path="./saved_model/PPO_RLLIB3",  # should be the same as what has been called in the train function !
            name="test",  # should be the same as what has been called in the train function !
            logs_path = "./logs/PPO_RLLIB3/",
            nb_process=1,
            verbose=verbose,
            )
    # you can also compare your agent with the do nothing agent relatively
    # easily
    runner_params = env.get_params_for_runner()
    runner = Runner(**runner_params)
    res = runner.run(nb_episode=nb_episode,
                    nb_process=nb_process
                    )
    # Print summary
    if verbose:
        print("Evaluation summary for Do Nothing Agent:")
        for _, chron_name, cum_reward, nb_time_step, max_ts in res:
            msg_tmp = "chronics at: {}".format(chron_name)
            msg_tmp += "\ttotal score: {:.6f}".format(cum_reward)
            msg_tmp += "\ttime steps: {:.0f}/{:.0f}".format(nb_time_step, max_ts)
            print(msg_tmp)
finally:
    env.close()

PPO with ACME framework

link to acme: https://github.com/deepmind/acme

Failed to install via pypi on Windows

Hi,

I tried to install the package via pip on a windows station and got an error for torch.

Collecting l2rpn_baselines
  Using cached l2rpn_baselines-0.5.0.tar.gz (145 kB)
Requirement already satisfied: grid2op[optional]>=0.9.1.post1 in d:\projects\rte-grid2viz\grid2op (from l2rpn_baselines) (1.2.3)
Collecting tensorflow>=2.2.0
  Using cached tensorflow-2.3.1-cp38-cp38-win_amd64.whl (342.5 MB)
Collecting Keras>=2.3.1
  Using cached Keras-2.4.3-py2.py3-none-any.whl (36 kB)
ERROR: Could not find a version that satisfies the requirement torch>=1.4.0 (from l2rpn_baselines) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)
ERROR: No matching distribution found for torch>=1.4.0 (from l2rpn_baselines)

I'm on a Windows 10 machine [version 10.0.19041.508] with Python 3.8.6 running inside a virtualenv.

Cheers

AttributeError: module 'grid2op' has no attribute 'make_new'

When call l2rpn_baselines.utils.zip_for_codalab, following error raises:

AttributeError: module 'grid2op' has no attribute 'make_new'

on https://github.com/rte-france/l2rpn-baselines/blob/master/l2rpn_baselines/utils/zip_for_codalab.py#L51

I think this should be grid2op.make.

Version Info

grid2op: 1.0.0 & 1.1.0
l2rpn_baselines: 0.4.3

Clean the API for nb_env and MultiEnvironment

Today, if you use "nb_env > 1" in the train function of all agents inheriting from DeepQAgent it is not clear at all that the environment provided should be an instance of MultiEnvironment and not an instance of the natural environment itself Environment.

This is confusing and should be clarified in this version, and the redundancy should be removed in future major release:

remove the "nb_env" keyword argument of the "train" function
initialize it with 1 if an Environment is passed as env argument of the train function
initialize it with the appropriate number if this is a MultiEnvironment

This should solve the issue, and will be properly documented in the doc (l2rpn-baselines + grid2op and in the getting_started notebook of grid2op)

"Impossible to use the RedispReward reward with an environment without generators cost"

System information

Grid2op version: 1.9.6
l2rpn-baselines version: 0.8.0
System: windows 11
Baseline concerned: CurriculumAgent
IDE: VisualStudio Code

Bug description

<Grid2OpException(grid2op.Exceptions.Grid2OpException.Grid2OpException: Grid2OpException "Impossible to use the RedispReward reward with an environment without generators cost. Please make sure env.redispatching_unit_commitment_availble is available.">

How to reproduce

<Run the CurriculumAgent evaluate.py with the l2rpn_wcci_2022 environment using the train_full_pipeline

Command line (if any)

<Line 105 of evaluate.py


### Code snippet (if any)
#!/usr/bin/env python3

# Copyright (c) 2020, RTE (https://www.rte-france.com)
# See AUTHORS.txt
# This Source Code Form is subject to the terms of the Mozilla Public License, version 2.0.
# If a copy of the Mozilla Public License, version 2.0 was not distributed with this file,
# you can obtain one at http://mozilla.org/MPL/2.0/.
# SPDX-License-Identifier: MPL-2.0
# This file is part of L2RPN Baselines, L2RPN Baselines a repository to host baselines for l2rpn competitions.
import logging
from pathlib import Path
from typing import Union, Optional

import grid2op
from grid2op.Reward import RedispReward
from grid2op.Runner import Runner
from l2rpn_baselines.utils.save_log_gif import save_log_gif

from curriculumagent.baseline.baseline import CurriculumAgent


def evaluate(
        env: grid2op.Environment.BaseEnv,
        load_path: Union[str, Path] = "C:\\Users\\mariana.souza\\data_grid2op\\l2rpn_wcci_2022",
        logs_path: Optional[Union[str, Path]] = "C:\\Users\\mariana.souza\\data_grid2op",
        nb_episode: int = 1,
        nb_process: int = 1,
        max_steps: int = -1,
        verbose: Union[bool, int] = True,
        save_gif: bool = True,
        **kwargs,
) -> Runner:
    """This is the evaluate method for the Curriculum Agent.

    Args:
        env: The environment on which the baseline will be evaluated. The default is the IEEE14 Case. For other
        environments please retrain the agent in advance.
        load_path: The path where the model is stored. This is used by the agent when calling "agent.load()"
        logs_path: The path where the agents results will be stored.
        nb_episode: Number of episodes to run for the assessment of the performance. By default, it equals 1.
        nb_process: Number of process to be used for the assessment of the performance. Should be an integer greater
        than 1. By default, it's equals 1.
        max_steps: Maximum number of timesteps each episode can last. It should be a positive integer or -1.
        -1 means that the entire episode is run (until the chronics is out of data or until a game over).
        By default,it equals -1.
        verbose: Verbosity of the output.
        save_gif:  Whether to save a gif into each episode folder corresponding to the representation of the said
        episode. Note, that depending on the environment (and the performance of your agent) this creation of the gif
        might take quite a lot of time!
        **kwargs:

    Returns:
        The experiment file consisting of the data.

    """
    runner_params = env.get_params_for_runner()
    runner_params["verbose"] = verbose

    # Create the agent (this piece of code can change)
    agent = CurriculumAgent(
        action_space=env.action_space,
        observation_space=env.observation_space,
        name="Evaluation"
    )
    # Load weights from file (for example)
    agent.load(load_path)

    # Build runner
    runner = Runner(**runner_params, agentClass=None, agentInstance=agent)

    # you can do stuff with your model here

    # start the runner

    if nb_process > 1:
        logging.warning(
            f"Parallel execution is not yet available for keras model. Therefore, the number of processes is comuted with "
            f"only one process."
        )
        nb_process = 1

    res = runner.run(path_save=logs_path, nb_episode=nb_episode, nb_process=nb_process, max_iter=max_steps, pbar=False)

    # Print summary
    logging.info("Evaluation summary:")
    for _, chron_name, cum_reward, nb_time_step, max_ts in res:
        msg_tmp = "\tFor chronics located at {}\n".format(chron_name)
        msg_tmp += "\t\t - cumulative reward: {:.6f}\n".format(cum_reward)
        msg_tmp += "\t\t - number of time steps completed: {:.0f} / {:.0f}".format(nb_time_step, max_ts)
        logging.info(msg_tmp)

    if save_gif:
        save_log_gif(logs_path, res)
    return res


if __name__ == "__main__":
    """
    This is a possible implementation of the eval script.
    """
    from lightsim2grid import LightSimBackend
    import grid2op

    logging.basicConfig(level=logging.INFO)
    env = grid2op.make('l2rpn_wcci_2022', backend=LightSimBackend())
    env.redispatching_unit_commitment_availble = True
    obs = env.reset()
    path_of_model = Path("C:\\Users\\mariana.souza\\data_grid2op\\l2rpn_wcci_2022")
    myagent = CurriculumAgent(
        action_space=env.action_space,
        observation_space=env.observation_space,
        model_path=path_of_model,
        path_to_data=path_of_model,
        name="Test",
    )
    env = grid2op.make('l2rpn_wcci_2022')
    out = evaluate(
        env,
        load_path=path_of_model,
        logs_path=Path(__file__).parent / "logs",
        nb_episode=10,
        nb_process=1,
        max_steps=-1,
        verbose=0,
        save_gif=True,
    )
... # Some code

Current output

Grid2OpException(
grid2op.Exceptions.Grid2OpException.Grid2OpException: Grid2OpException "Impossible to use the RedispReward reward with an environment without generators cost. Please make sure env.redispatching_unit_commitment_availble is available."

Expected output

The evaluate of the agent

Question about Deep Q implementation

In this line

Shouldn't it be

next_a = np.argmax(target_next, axis=-1)

instead of

next_a = np.argmax(fut_action, axis=-1)

My understanding is that we pick the action that maximize the action value from target_next but it looks like we pick the action from fut_action and then get the action value from target_next, why is that? or it doesn't matter?
Thanks for reading

Tricks for fine-tuning the hyperparameters

Hello! I am trying to implement other reinforcement learning method to deal with the l2rpn problem, but I find that my result cannot match the performance with the DQN implementation in l2rpn-baselines.

Even I use the other library to implement the DQN, when I change the hyperparameters a little, I cannot get the reasonable result.

So I want to know how you fine-tune the hyperparameters of DQN in the l2rpn problem. Is there any tricks?

'_missing_two_busbars_support_info' attribute is lost when using Train

Environment

Grid2op version: 1.10.0
System: Windows 11
Python: 3.11.3
LightSim2Grid: '0.7.5'

Bug description

When using the LightSim2Grid backend the attribute '_missing_two_busbars_support_info' is False since LightSim2Grid currently does not support more than 2 busbars per substation. Normally, this only results in a warning. However, when using the train() method from l2rpn_baselines, it seems this attribute is lost somewhere. As a result when backend.assert_grid_correct() is run an attribute error is thrown:
'LightSimBackend_rte_case14_realistic_train' object has no attribute '_missing_two_busbars_support_info'

How to reproduce

import grid2op
import lightsim2grid
import l2rpn_baselines.PPO_SB3 as PPO_SB3
ENV_NAME = "rte_case14_realistic_train"
env = grid2op.make(ENV_NAME, backend=lightsim2grid.LightSimBackend())
obs = env.reset()
agent = PPO_SB3.train(env, iterations=1, logs_dir=None, save_path=None, net_arch=[100,100,100])

Error Trace

AttributeError                            Traceback (most recent call last)
Cell In[3], line 24
     22 env = grid2op.make(ENV_NAME, backend=lightsim2grid.LightSimBackend())
     23 obs = env.reset()
---> 24 agent = PPO_SB3.train(env, iterations=1, logs_dir=None, save_path=None, verbose=True, net_arch=[100,100,100])

File ...\.venv\Lib\site-packages\l2rpn_baselines\PPO_SB3\train.py:306, in train(env, name, iterations, save_path, load_path, net_arch, logs_dir, learning_rate, checkpoint_callback, save_every_xxx_steps, model_policy, obs_attr_to_keep, obs_space_kwargs, act_attr_to_keep, act_space_kwargs, policy_kwargs, normalize_obs, normalize_act, gymenv_class, gymenv_kwargs, verbose, seed, eval_env, **kwargs)
    299     agent = SB3Agent(env.action_space,
    300                      env_gym.action_space,
    301                      env_gym.observation_space,
    302                      nn_path=os.path.join(load_path, name)
    303     )
    305 # train it
--> 306 agent.nn_model.learn(total_timesteps=iterations,
    307                      callback=checkpoint_callback,
    308                      # eval_env=eval_env  # TODO
    309                      )
    311 # save it
    312 if save_path is not None:

File ...\.venv\Lib\site-packages\stable_baselines3\ppo\ppo.py:315, in PPO.learn(self, total_timesteps, callback, log_interval, tb_log_name, reset_num_timesteps, progress_bar)
    306 def learn(
    307     self: SelfPPO,
    308     total_timesteps: int,
   (...)
    313     progress_bar: bool = False,
    314 ) -> SelfPPO:
--> 315     return super().learn(
    316         total_timesteps=total_timesteps,
    317         callback=callback,
    318         log_interval=log_interval,
    319         tb_log_name=tb_log_name,
    320         reset_num_timesteps=reset_num_timesteps,
    321         progress_bar=progress_bar,
    322     )

File ...\.venv\Lib\site-packages\stable_baselines3\common\on_policy_algorithm.py:264, in OnPolicyAlgorithm.learn(self, total_timesteps, callback, log_interval, tb_log_name, reset_num_timesteps, progress_bar)
    253 def learn(
    254     self: SelfOnPolicyAlgorithm,
    255     total_timesteps: int,
   (...)
    260     progress_bar: bool = False,
    261 ) -> SelfOnPolicyAlgorithm:
    262     iteration = 0
--> 264     total_timesteps, callback = self._setup_learn(
    265         total_timesteps,
    266         callback,
    267         reset_num_timesteps,
    268         tb_log_name,
    269         progress_bar,
    270     )
    272     callback.on_training_start(locals(), globals())
    274     assert self.env is not None

File ...\.venv\Lib\site-packages\stable_baselines3\common\base_class.py:423, in BaseAlgorithm._setup_learn(self, total_timesteps, callback, reset_num_timesteps, tb_log_name, progress_bar)
    421 if reset_num_timesteps or self._last_obs is None:
    422     assert self.env is not None
--> 423     self._last_obs = self.env.reset()  # type: ignore[assignment]
    424     self._last_episode_starts = np.ones((self.env.num_envs,), dtype=bool)
    425     # Retrieve unnormalized observation for saving into the buffer

File ...\.venv\Lib\site-packages\stable_baselines3\common\vec_env\dummy_vec_env.py:77, in DummyVecEnv.reset(self)
     75 for env_idx in range(self.num_envs):
     76     maybe_options = {"options": self._options[env_idx]} if self._options[env_idx] else {}
---> 77     obs, self.reset_infos[env_idx] = self.envs[env_idx].reset(seed=self._seeds[env_idx], **maybe_options)
     78     self._save_obs(env_idx, obs)
     79 # Seeds and options are only used once

File ...\.venv\Lib\site-packages\stable_baselines3\common\monitor.py:83, in Monitor.reset(self, **kwargs)
     81         raise ValueError(f"Expected you to pass keyword argument {key} into reset")
     82     self.current_reset_info[key] = value
---> 83 return self.env.reset(**kwargs)

File ...\.venv\Lib\site-packages\grid2op\gym_compat\gymenv.py:303, in GymnasiumEnv.reset(self, seed, options)
    296 def reset(self,
    297           *,
    298           seed: Optional[int]=None,
   (...)
    301              RESET_INFO_GYM_TYPING
    302           ]:
--> 303     return self._aux_reset_new(seed, options)

File ...\.venv\Lib\site-packages\grid2op\gym_compat\gymenv.py:184, in __AuxGymEnv._aux_reset_new(self, seed, options)
    180     seed, next_seed, underlying_env_seeds = self._aux_seed_g2op(seed)
    182 # we don't seed grid2op with reset as it is done
    183 # earlier
--> 184 g2op_obs = self.init_env.reset(seed=None, options=options)
    185 gym_obs = self.observation_space.to_gym(g2op_obs)
    187 chron_id = self.init_env.chronics_handler.get_id()

File ...\.venv\Lib\site-packages\grid2op\Environment\environment.py:988, in Environment.reset(self, seed, options)
    986 self._reset_redispatching()
    987 self._reset_vectors_and_timings()  # it need to be done BEFORE to prevent cascading failure when there has been
--> 988 self.reset_grid()
    989 if self.viewer_fig is not None:
    990     del self.viewer_fig

File ...\.venv\Lib\site-packages\grid2op\Environment\environment.py:868, in Environment.reset_grid(self)
    852 """
    853 INTERNAL
    854 
   (...)
    863 
    864 """
    865 self.backend.reset(
    866     self._init_grid_path
    867 )  # the real powergrid of the environment
--> 868 self.backend.assert_grid_correct()
    870 if self._thermal_limit_a is not None:
    871     self.backend.set_thermal_limit(self._thermal_limit_a.astype(dt_float))

File ...\.venv\Lib\site-packages\grid2op\Backend\backend.py:1947, in Backend.assert_grid_correct(self)
   1944 from grid2op.Action import CompleteAction
   1945 from grid2op.Action._backendAction import _BackendAction
-> 1947 if self._missing_two_busbars_support_info:
   1948     warnings.warn("The backend implementation you are using is probably too old to take advantage of the "
   1949                   "new feature added in grid2op 1.10.0: the possibility "
   1950                   "to have more than 2 busbars per substations (or not). "
   (...)
   1958                   "handle more than 2 busbars per substation, then change it :-)\n"
   1959                   "Your backend will behave as if it did not support it.")
   1960     self._missing_two_busbars_support_info = False

Evaluation with normalised observation and action space is improper for PPO_SB3

System information

Grid2op version: 1.8.1
l2rpn-baselines version: 0.6.0.post1
System: osx
Baseline concerned: eg PPO_SB3
stable-baseline3 version 1.7.0

Bug description

After training with train script with normalize_obs=True and normalize_act=True, and then trying to use the trained agent for evaluation leads to incorrect results.

How to reproduce

The train script used

import re
import grid2op
from grid2op.Reward import LinesCapacityReward  # or any other rewards
from grid2op.Chronics import MultifolderWithCache  # highly recommended
from lightsim2grid import LightSimBackend  # highly recommended for training !
from l2rpn_baselines.PPO_SB3 import train

env_name = "l2rpn_case14_sandbox"
env = grid2op.make(env_name,
                   reward_class=LinesCapacityReward,
                   backend=LightSimBackend(),
                   chronics_class=MultifolderWithCache)
env.chronics_handler.real_data.set_filter(lambda x: re.match(".*00$", x) is not None)
env.chronics_handler.real_data.reset()

try:
    trained_agent = train(
          env,
          iterations=10_000,  # any number of iterations you want
          logs_dir="./logs",  # where the tensorboard logs will be put
          save_path="./saved_model",  # where the NN weights will be saved
          name="test",  # name of the baseline
          net_arch=[100, 100, 100],  # architecture of the NN
          normalize_act=True,
          normalize_obs=True,
          )
finally:
    env.close()

Evaluation script

import grid2op
from grid2op.Reward import LinesCapacityReward  # or any other rewards
from lightsim2grid import LightSimBackend  # highly recommended !
from l2rpn_baselines.PPO_SB3 import evaluate

nb_episode = 7
nb_process = 1
verbose = True
env_name = "l2rpn_case14_sandbox"

env = grid2op.make(env_name,
                   reward_class=LinesCapacityReward,
                   backend=LightSimBackend()
                   )
try:
    evaluate(env,
            nb_episode=nb_episode,
            load_path="./saved_model",  # should be the same as what has been called in the train function !
            name="test",  # should be the same as what has been called in the train function !
            nb_process=1,
            verbose=verbose,
            )
    
    runner_params = env.get_params_for_runner()
    runner = Runner(**runner_params)
    res = runner.run(nb_episode=nb_episode,
                    nb_process=nb_process
                    )
    # Print summary
    if verbose:
        print("Evaluation summary for DN:")
        for _, chron_name, cum_reward, nb_time_step, max_ts in res:
            msg_tmp = "chronics at: {}".format(chron_name)
            msg_tmp += "\ttotal score: {:.6f}".format(cum_reward)
            msg_tmp += "\ttime steps: {:.0f}/{:.0f}".format(nb_time_step, max_ts)
            print(msg_tmp)
finally:
    env.close()

The results are very similar to Do Nothing agent, which does not happen if during training normalise_obs and normalise_act is set to False

Possible Solution

The issue is happening because of using load_path instead of my_path in the following two lines

l2rpn-baselines/l2rpn_baselines/PPO_SB3/evaluate.py

Line 178 in c1e2d36

if os.path.exists(os.path.join(load_path, ".normalize_act")):

l2rpn-baselines/l2rpn_baselines/PPO_SB3/evaluate.py

Line 186 in c1e2d36

if os.path.exists(os.path.join(load_path, ".normalize_obs")):

Making this change resolved the issue for my case.

Add issue template

Inspired from lightsim2grid or grid2op to make it easier for people to submit consistent and easy to fix issue.

Wrong name for the "_optimizer_model" attribute

In the code of the BaseDeepQ class the attribute optimizer_model is private, whereas it is incorrectly used in the class DeepQAgent (accessed through optimizer_model)

To fix this issue, just update this line https://github.com/rte-france/l2rpn-baselines/blob/master/l2rpn_baselines/utils/DeepQAgent.py#L591
with:

self._train_lr = self.deep_q._optimizer_model._decayed_lr('float32').numpy()

Also this entails that the tests do not attempt to train the baselines at all. This would be nice to fix them.

Add PPO with mazerl for example

Link to maze: https://github.com/enlite-ai/maze

PPO_RLLIB code improvement

System information

Grid2op version: 1.9.5
l2rpn-baselines version: 0.8.0
System: osx
Baseline concerned: eg PPO_RLLIB

Bug description

The PPO_RLLIB code has been updated but there are couple of issues

Missing the following line self.env_glop.chronics_handler.reset() after

l2rpn-baselines/l2rpn_baselines/PPO_RLLIB/env_rllib.py

Line 103 in ba346d3

self.env_glop = grid2op.make(nm_env, backend=backend, **env_config)

and need to add it to make the train and eval script work.
There environment seems to be getting created twice. First one just to convert the environment observation and action space into gym format and then pass into the RLLIBAgent class where the environment is built again through rllib library. If I understand correctly this takes more memory for two environments and rewriting to just make one will help with memory.
The environment for the l2rpn_neurips_2020_track1_small take a very long time to do 100 iterations with train_batch_size of 20,000 added to env_config_ppo. These two parameters may even need to be higher to get good results. If something can be done to speed up the training that would be helpful for scaling to bigger networks.

How to reproduce

Execute the train and eval script here

Expected output

Train script should run without any issues and memory requirement is lower and faster training

Have an example using DI-engine, maybe ?

See : https://github.com/opendilab/DI-engine

GymEnvWithHeuristics: fix_action is missing something

GymEnvWithHeuristics: fix_action is missing the "observation" as argument ! Otherwise it cannot be used, for example to "limit_curtail_storage"... Which was its main usecase...

Participants must have a GPU, should allow for a CPU

Line 79 of l2rpn_baselines\DoubleDuelingDQN\train.py makes it mandatory to have a GPU.

Should be fixed in order to allow participants to use their CPU instead.

Make hyperparameters into Instance variables

Currently they are global in DDDQN baseline (and possibly others)

AttributeError: 'Tensor' object has no attribute 'numpy'

When I run the following code:

import grid2op
from l2rpn_baselines.DoubleDuelingDQN import train

env = grid2op.make()
train(env, save_path='../checkpoints', iterations=1000, verbose=True)

I get the following error:

Traceback (most recent call last):
  File "/home/alwin/PycharmProjects/l2rpn-challenge/l2rpn_baseline/train.py", line 36, in <module>
    main()
  File "/home/alwin/PycharmProjects/l2rpn-challenge/l2rpn_baseline/train.py", line 32, in main
    train(env, save_path=args.save_path, iterations=args.iterations, verbose=True)
  File "/home/alwin/miniconda3/envs/l2rpn_challenge/lib/python3.7/site-packages/l2rpn_baselines/DoubleDuelingDQN/train.py", line 96, in train
    logs_path)
  File "/home/alwin/miniconda3/envs/l2rpn_challenge/lib/python3.7/site-packages/l2rpn_baselines/DoubleDuelingDQN/DoubleDuelingDQN.py", line 274, in train
    self._batch_train(training_step, step)
  File "/home/alwin/miniconda3/envs/l2rpn_challenge/lib/python3.7/site-packages/l2rpn_baselines/DoubleDuelingDQN/DoubleDuelingDQN.py", line 355, in _batch_train
    loss = self.Qmain.train_on_batch(input_t, Q, w_batch)
  File "/home/alwin/miniconda3/envs/l2rpn_challenge/lib/python3.7/site-packages/l2rpn_baselines/DoubleDuelingDQN/DoubleDuelingDQN_NN.py", line 83, in train_on_batch
    batch_loss = self._batch_loss(y_true, y_pred)
  File "/home/alwin/miniconda3/envs/l2rpn_challenge/lib/python3.7/site-packages/l2rpn_baselines/DoubleDuelingDQN/DoubleDuelingDQN_NN.py", line 113, in _batch_loss
    self.batch_sq_error = batch_sq_error.numpy()
AttributeError: 'Tensor' object has no attribute 'numpy'

It appears after approximately 15 seconds while training. I also have the problem with DoubleDuelingRDQN. I am using version 0.4.4 installed via pip.

GEIRINA Baseline

Hi Benjamin,

Could you please check if GEIRINA baseline is ready to be included into this repo?
The link is

https://github.com/djmax008/GEIRINA_baseline

Thanks,
Jiajun

Access to information on the objects from their Names

An option should be added to access or modify the variables of objects in the grid (powerlines / loads / generator) from the names of the objects instead of their ID.

For example instead of the following :

change_status = action_space.get_change_line_status_vect()
change_status[0] = True

, it should be possible to do :

change_status = action_space.get_change_line_status_vect()
change_status["0_3_0"] = True

Maybe with something like:

class CallableVector(np.ndarray):
    def __getitem__(self, x):
        try:
            return super().__getitem__(x)
        except KeyError: # x is a string
            idx = np.argmin(env.name_line == x) # env has to be defined in order to get the names
            return super().__getitem__(idx)

And this wouldn't add anything to the computation time.

The same goes for the vectors in the Observation objects.

Error while running the baselines

Hello! I am trying to run the baselines by import the train function but I keep getting the following error. Could someone please advice me on what I need to change.

The code:

from l2rpn_baselines.DoubleDuelingRDQN import train

env = grid2op.make()
train(env)

The error:
in _save_hyperparameters(self, logpath, env, steps)
99
100 def _save_hyperparameters(self, logpath, env, steps):
--> 101 r_instance = env.reward_helper.template_reward
102 hp = {
103 "lr": cfg.LR,

AttributeError: 'Environment_rte_case14_realistic' object has no attribute 'reward_helper'

encounter a litter error

When I run the code train_it, this error occurs. Please tell me how to solve it. Thank you very much.
AttributeError: module 'gym.spaces' has no attribute 'dict'

Documentation issue

When copy pasting the documentation of the SAC train function (https://l2rpn-baselines.readthedocs.io/en/stable/SAC.html#l2rpn_baselines.SAC.train) the program does not work.

The documentation should be adapted, for the SAC as:

import grid2op
from grid2op.Reward import L2RPNReward
from l2rpn_baselines.utils import TrainingParam
from l2rpn_baselines.SAC import train
from l2rpn_baselines.utils import NNParam

# define the environment
env = grid2op.make("l2rpn_case14_sandbox",
                   reward_class=L2RPNReward)

# use the default training parameters
tp = TrainingParam()

# this will be the list of what part of the observation I want to keep
# more information on https://grid2op.readthedocs.io/en/latest/observation.html#main-observation-attributes
li_attr_obs_X = ["day_of_week", "hour_of_day", "minute_of_hour", "prod_p", "prod_v", "load_p", "load_q",
                 "actual_dispatch", "target_dispatch", "topo_vect", "time_before_cooldown_line",
                 "time_before_cooldown_sub", "rho", "timestep_overflow", "line_status"]

# neural network architecture
observation_size = NNParam.get_obs_size(env, li_attr_obs_X)
sizes_q = [800, 800, 800, 494, 494, 494]  # sizes of each hidden layers
sizes_v = [800, 800]  # sizes of each hidden layers
sizes_pol = [800, 800, 800, 494, 494, 494]  # sizes of each hidden layers
kwargs_archi = {'observation_size': observation_size,
                'sizes': sizes_q,
                'activs': ["relu" for _ in range(len(sizes_q))],
                "list_attr_obs": li_attr_obs_X,
                "sizes_value": sizes_v,
                "activs_value": ["relu" for _ in range(len(sizes_v))],
                "sizes_policy": sizes_pol,
                "activs_policy": ["relu" for _ in range(len(sizes_pol))]
                }

# select some part of the action
# more information at https://grid2op.readthedocs.io/en/latest/converter.html#grid2op.Converter.IdToAct.init_converter
kwargs_converters = {"all_actions": None,
                     "set_line_status": False,
                     "change_bus_vect": True,
                     "set_topo_vect": False
                     }
# define the name of the model
nm_ = "AnneOnymous"
save_path="/WHERE/I/SAVED/THE/MODEL"
logs_dir="/WHERE/I/SAVED/THE/LOGS"
try:
    train(env,
          name=nm_,
          iterations=10000,
          save_path=save_path,
          load_path=None,
          logs_dir=logs_dir,
          nb_env=1,
          training_param=tp,
          kwargs_converters=kwargs_converters,
          kwargs_archi=kwargs_archi)
finally:
    env.close()

Issues with retraining from saved agent

System information

Grid2op version: 1.8.1
l2rpn-baselines version: 0.6.0.post1
System: mac osx
stable-baseline 3 version 1.7.0
Baseline concerned: PPO_SB3

Bug description

When I am trying to resume training from a saved agent, I am getting some errors. The saved agent however is working properly with evaluate function

How to reproduce

import re
import copy
import grid2op
from grid2op.Reward import LinesCapacityReward  # or any other rewards
from grid2op.Chronics import MultifolderWithCache  # highly recommended
from lightsim2grid import LightSimBackend  # highly recommended for training !
from l2rpn_baselines.PPO_SB3 import train, evaluate

env_name = "l2rpn_case14_sandbox"
obs_attr_to_keep = ["day_of_week", "hour_of_day", "minute_of_hour", "prod_p", "prod_v", "load_p", "load_q",
                    "actual_dispatch", "target_dispatch", "topo_vect", "time_before_cooldown_line",
                    "time_before_cooldown_sub", "rho", "timestep_overflow", "line_status",
                    "storage_power", "storage_charge"]
act_attr_to_keep = ["redispatch"]

env = grid2op.make(env_name,
                    reward_class=LinesCapacityReward,
                    backend=LightSimBackend(),
                    chronics_class=MultifolderWithCache)
env.chronics_handler.real_data.set_filter(lambda x: re.match(".*00$", x) is not None)
env.chronics_handler.real_data.reset()


train(env,
        iterations=1000,  # any number of iterations you want
        logs_dir="./logs/PPO_SB3_test",  # where the tensorboard logs will be put
        save_path="./saved_model/PPO_SB3_test",  # where the NN weights will be saved
        name="Reload_test",  # name of the baseline
        net_arch=[200, 200, 200],  # architecture of the NN
        obs_attr_to_keep=copy.deepcopy(obs_attr_to_keep),
        act_attr_to_keep=copy.deepcopy(act_attr_to_keep),
        normalize_obs=True,
        )

evaluate(env,
            nb_episode=3,
            load_path="./saved_model/PPO_SB3_test/",  # should be the same as what has been called in the train function !
            name="Reload_test",  # should be the same as what has been called in the train function !
            logs_path = "./logs/PPO_SB3/",
            nb_process=1,
            verbose=True,
            )

train(env,
        iterations=1000,  # any number of iterations you want
        logs_dir="./logs/PPO_SB3_test",  # where the tensorboard logs will be put
        load_path="./saved_model/PPO_SB3_test/Reload_test",
        save_path="./saved_model/PPO_SB3_test",  # where the NN weights will be saved
        name="Reload_test.zip",  # name of the baseline
        obs_attr_to_keep=copy.deepcopy(obs_attr_to_keep),
        act_attr_to_keep=copy.deepcopy(act_attr_to_keep),
        normalize_obs=True,
        )

Current output

I am getting the following error message

/Users/paula/Desktop/Projects/venvs/L2PRN/lib/python3.10/site-packages/grid2op/gym_compat/box_gym_obsspace.py:765: UserWarning: The normalization of attribute "[False False False False False False]" cannot be performed entirely as there are some non finite value, or `high == `low` for some components.
  warnings.warn(f"The normalization of attribute \"{both_finite}\" cannot be performed entirely as "
/Users/paula/Desktop/Projects/venvs/L2PRN/lib/python3.10/site-packages/grid2op/gym_compat/box_gym_obsspace.py:765: UserWarning: The normalization of attribute "[False False False False False False False False False False False]" cannot be performed entirely as there are some non finite value, or `high == `low` for some components.
  warnings.warn(f"The normalization of attribute \"{both_finite}\" cannot be performed entirely as "
/Users/paula/Desktop/Projects/venvs/L2PRN/lib/python3.10/site-packages/grid2op/gym_compat/box_gym_obsspace.py:765: UserWarning: The normalization of attribute "[False False False False False False False False False False False False
 False False False False False False False False]" cannot be performed entirely as there are some non finite value, or `high == `low` for some components.
  warnings.warn(f"The normalization of attribute \"{both_finite}\" cannot be performed entirely as "
Traceback (most recent call last):
  File "/Users/paula/Desktop/Projects/RL Practice/L2RPN Aspen/Demo Notebooks/PPO_SB3_train_reload.py", line 45, in <module>
    train(env,
  File "/............../L2PRN/lib/python3.10/site-packages/l2rpn_baselines/PPO_SB3/train.py", line 305, in train
    agent.nn_model.learn(total_timesteps=iterations,
  File "/............../L2PRN/lib/python3.10/site-packages/stable_baselines3/ppo/ppo.py", line 307, in learn
    return super().learn(
  File "/............../L2PRN/lib/python3.10/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 236, in learn
    total_timesteps, callback = self._setup_learn(
  File "/............../L2PRN/lib/python3.10/site-packages/stable_baselines3/common/base_class.py", line 408, in _setup_learn
    self._last_obs = self.env.reset()  # pytype: disable=annotation-type-mismatch
AttributeError: 'NoneType' object has no attribute 'reset'

Huge cleaning

make it, in the setup.py the minimal requirements for all module available (like for PPO_RLLIB and for PPO_SB3)
refactor the rllib environment to include gymEnvWithHeuristics
clean the gymEnvWithHeuristics
make PPO usable with discrete / multi discrete actions
make a class directly for RLLIB Env (with parrallel considerations, eg vectorized env)
make a class directly for SB3 env (with parrallel considerations, eg vectorized env)
add unit tests
remove tensorflow dep

rte-france / l2rpn-baselines Goto Github PK

l2rpn-baselines's Introduction

L2RPN_Baselines

Install

Requirements

Instal from PyPI

Install from source

Contribute

Get started with a baseline

Train it (optional)

Evaluate it

l2rpn-baselines's People

Contributors

Stargazers

Watchers

Forkers

l2rpn-baselines's Issues

System information

Bug description

How to reproduce

Version Info

System information

Bug description

How to reproduce

Command line (if any)

Current output

Expected output

Environment

Bug description

How to reproduce

Error Trace

System information

Bug description

How to reproduce

Possible Solution

System information

Bug description

How to reproduce

Expected output

System information

Bug description

How to reproduce

Current output

Recommend Projects

Recommend Topics

Recommend Org