Thank you for the RLMatrix. I have a complex environment that cannot

Train one observation at a time about rl_matrix HOT 5 CLOSED

alpha-wavelet commented on May 27, 2024

Train one observation at a time

from rl_matrix.

Comments (5)

asieradzk commented on May 27, 2024

Hey.

Thanks for trying RLMatrix.

Sorry for the late reply I was actually holidaying and I have some more features that I will be pumping into RL Matrix in the next couple weeks.

Could you show some more examples of how you're using it so I understand better what features to add?

If I understand correctly: You want to be able to pool environment every now and then for observation?
Or perhaps you want to have buffer of observations & actions from realtime?

Some code would be great!

from rl_matrix.

alpha-wavelet commented on May 27, 2024

Hi Adrian,

I am implementing RLMatrix in the NinjaTrader trading platform. NinjaTrader provides a C# environment for writing indicators and trading strategies. NinjaTrader provides an event driven environment, e.g., OnBarUpdate() where a strategy can Buy or Sell a security. Even though buying and selling may sound simple, it is not. I would not chance on trying to simulate it. So execution has to advance one price change at a time. For that I modified your code a little to process each observation. I am using the previous release of the RLMatrix since I could not get the latest release to work.

PPOAgent:

T state;
List<Transition<T>> transitionsInEpisode;
bool initial = true;
float cumulativeReward;
public void TrainObservation()
{
    if (initial)
    {
		initial = false;
		episodeCounter++;
		// Initialize the environment and get its state
		myEnvironment.Reset();
		state = DeepCopy(myEnvironment.GetCurrentState());
		cumulativeReward = 0;
		transitionsInEpisode = new List<Transition<T>>();
	}
    
	// Select an action based on the policy
	(int[], float[]) action = SelectAction(state);
	// Take a step using the selected action
	float reward = myEnvironment.Step(action.Item1, action.Item2);
	// Check if the episode is done
	var done = myEnvironment.isDone;

	T nextState;
	if (done)
	{
		// If done, there is no next state
		nextState = default;
		initial = true;
	}
	else
	{
		// If not done, get the next state
		nextState = DeepCopy(myEnvironment.GetCurrentState());
	}

	if (state == null)
		Console.WriteLine("state is null");

	// Store the transition in temporary memory
	transitionsInEpisode.Add(new Transition<T>(state, action.Item1, action.Item2, reward, nextState));

	cumulativeReward += reward;
	// If not done, move to the next state
	if (!done)
	{
		state = nextState;
	}
	else
	{
		foreach (var item in transitionsInEpisode)
		{
			myReplayBuffer.Push(item);
		}

		OptimizeModel();

		//TODO: hardcoded chart
		episodeRewards.Add(cumulativeReward);

		if (myOptions.DisplayPlot != null)
		{
			myOptions.DisplayPlot.CreateOrUpdateChart(episodeRewards);
		}
	}
}

DQNAgent:

T state;
bool initial = true;
float cumulativeReward;
public void TrainObservation()
{
    if (initial)
    {
		initial = false;
		episodeCounter++;
        // Initialize the environment and get its state
        myEnvironment.Reset();
        state = DeepCopy(myEnvironment.GetCurrentState());
        cumulativeReward = 0;
    }

	// Select an action based on the policy
	var action = SelectAction(state);
	// Take a step using the selected action
	var reward = myEnvironment.Step(action);
	// Check if the episode is done
	var done = myEnvironment.isDone;


	T nextState;
	if (done)
	{
		// If done, there is no next state
		nextState = default;
        initial = true;
	}
	else
	{
		// If not done, get the next state
		nextState = DeepCopy(myEnvironment.GetCurrentState());
	}

	if (state == null)
		Console.WriteLine("state is null");

	// Store the transition in temporary memory
	myReplayBuffer.Push(new Transition<T>(state, action, null, reward, nextState));



	cumulativeReward += reward;
	// If not done, move to the next state
	if (!done)
	{
		state = nextState;
	}
	// Perform one step of the optimization (on the policy network)
	OptimizeModel();

	// Soft update of the target network's weights
	// θ′ ← τ θ + (1 −τ )θ′
	SoftUpdateTargetNetwork();


	if (done)
	{
		episodeRewards.Add(cumulativeReward);
		if (myOptions.DisplayPlot != null)
		{
			myOptions.DisplayPlot.CreateOrUpdateChart(episodeRewards);
		}
	}
}

It all runs, but the results are random. First of all, running on CPU uses only 2 threads, and GPU is even slower. DQN is too slow. On average a test may have 30K events on 200K observations. Reloading previously trained agent and running training on the same data produces random results. No improvement at all. PPO tends to quit very early and I tried different parameters. In general, both DQN and PPO do not respond to feedback. PPO gets stuck trying the same action.

Thank you

from rl_matrix.

asieradzk commented on May 27, 2024

Okay so I've updated the nuget packages and github repository to the newest version of RLMatrix

This time is like you required - one step of the environment at the time, better yet we can step any number of environments simultaneously. I've updated examples so you can have a look at the code there. It doesn't change much.

var envppo = new List<IEnvironment<float[]>> { new CartPole(), new CartPole() };
var myAgentppo = new PPOAgent<float[]>(optsppo, envppo);

for (int i = 0; i < 10000; i++)
{
    myAgentppo.Step();
}

Let me know if this works for you :)

On another note its great you are trying to use deep reinforcement learning for stock trading. I know many academics are working on this difficult task and my adventure with deep learning also started with trying to use it for crypto trading. Keep in mind this is going to be daunting task, I would suggest you have a look first at examples where reinforcement learning was used successfully to win at poker.

https://www.science.org/doi/10.1126/science.aay2400

from rl_matrix.

alpha-wavelet commented on May 27, 2024

Thank you for the update and the article.

I already have a GBM (on top of other tools) to forecast market at over 80% accuracy. The RL layer on top is to make trades, a task for which it is more suitable.

from rl_matrix.

asieradzk commented on May 27, 2024

Thank you for the update and the article.

I already have a GBM (on top of other tools) to forecast market at over 80% accuracy. The RL layer on top is to make trades, a task for which it is more suitable.

In that case sounds like a good use case. Hope it works out for :)
I will close the issue now but feel free to contact me or open new one anytime when you need help setting something up with RL Matrix I am happy to help.

from rl_matrix.

Train one observation at a time about rl_matrix HOT 5 CLOSED

Comments (5)

Related Issues (5)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent