<input type="checkbox" id="" disabled=""

The default random_collect_size is not compatible with episode collector about di-engine HOT 7 CLOSED

opendilab commented on August 22, 2024

The default random_collect_size is not compatible with episode collector

from di-engine.

Comments (7)

PaParaZz1 commented on August 22, 2024

random_collect_size means collecting n_sample data, so it is not compatible with episode collector. Why do you need to collect data by episode? If necessary, we will add n_episode option in this function.

from di-engine.

tianhan4 commented on August 22, 2024

I do not intend to use random collect, however, random_collect_size is hardcoded in the 'serial_entry'. Thus I think that I either need to write my own entry function or delete the default random_collect_size parameter in the policies.

from di-engine.

PaParaZz1 commented on August 22, 2024

OK, you can set random_collect_size=0 in your config to disable this attribute.

BTW, we set random_collect_size=1000 in SACPolicy default config so that it would be more steady in the early phase of off-policy training, such as some mujoco environments, and you can adjust it for your own environment.

from di-engine.

PaParaZz1 commented on August 22, 2024

This comparison experiment shows the importance of random_collect_size in mujoco halfcheetah environment:
random_collect_size=10000 VS random_collect_size=0

from di-engine.

tianhan4 commented on August 22, 2024

Ok, thank you! I use n_episode because my environment is a real-time environment(congestion control) and cannot pause step by step. Thus, I personally think it's useful to transform the random_collect_size parameter to the number of episodes when n_episode is used.

from di-engine.

tianhan4 commented on August 22, 2024

FYI. For myself, I implemented a new noise generator to generate random noise and Gaussian noise in the first steps to fit my need. Thus it's irrelevant to whether I use n_episode or n_sample.
Also, I inherited a new base noise class because I want to pass the action as an argument to generate some action-dependent noise. For example, pure random output.


class BaseNoise2(ABC):
    def __init__(self) -> None:
        super().__init__()

    @abstractmethod
    def __call__(self, action: torch.Tensor, shape: tuple, device: str) -> torch.Tensor:
        raise NotImplementedError

class HybridNoise(BaseNoise2):
    def __init__(self, mu: float = 0.0, sigma: float = 1.0,
                 noise_exp: int = 10000, random_exp: int = 10000, noise_end: float = 0.05) -> None:

        super(HybridNoise, self).__init__()
        self._mu = mu
        assert sigma >= 0, "GaussianNoise's sigma should be positive."
        self._sigma = sigma
        self._noise_exp = noise_exp
        self._random_exp = random_exp
        self._noise_end = noise_end
        self._noise_interval = sigma - noise_end
        self._random_step = random_exp
        assert self._noise_interval >= 0, "HybridNoise's sigma should be larger than the end sigma."
    
    def __call__(self, action: torch.Tensor, shape: tuple, device: str) -> torch.Tensor:
        if self._random_step >= 0:
            self._random_step -= 1
        random_noise = torch.rand(shape, device=device) * 2 - 1
        if np.random.uniform(0,1) < self._random_step/float(self._random_exp):
            return random_noise - action
        else:
            if self._sigma <= self._noise_end:
                pass
            else:
                self._sigma -= self._noise_interval/self._noise_exp
            gaussian_noise = torch.randn(shape, device=device)
            gaussian_noise = gaussian_noise * self._sigma + self._mu
            return gaussian_noise

from di-engine.

PaParaZz1 commented on August 22, 2024

Ok, thank you! I use n_episode because my environment is a real-time environment(congestion control) and cannot pause step by step. Thus, I personally think it's useful to transform the random_collect_size parameter to the number of episodes when n_episode is used.

We have fixed this problem in #190, and you can use episode collector for random collecting in the beginning of training.

from di-engine.

The default random_collect_size is not compatible with episode collector about di-engine HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent