Coder Social home page Coder Social logo

Comments (5)

asieradzk avatar asieradzk commented on May 27, 2024

Hey.

Thanks for trying RLMatrix.

Sorry for the late reply I was actually holidaying and I have some more features that I will be pumping into RL Matrix in the next couple weeks.

Could you show some more examples of how you're using it so I understand better what features to add?

If I understand correctly: You want to be able to pool environment every now and then for observation?
Or perhaps you want to have buffer of observations & actions from realtime?

Some code would be great!

from rl_matrix.

alpha-wavelet avatar alpha-wavelet commented on May 27, 2024

Hi Adrian,

I am implementing RLMatrix in the NinjaTrader trading platform. NinjaTrader provides a C# environment for writing indicators and trading strategies. NinjaTrader provides an event driven environment, e.g., OnBarUpdate() where a strategy can Buy or Sell a security. Even though buying and selling may sound simple, it is not. I would not chance on trying to simulate it. So execution has to advance one price change at a time. For that I modified your code a little to process each observation. I am using the previous release of the RLMatrix since I could not get the latest release to work.

PPOAgent:

T state;
List<Transition<T>> transitionsInEpisode;
bool initial = true;
float cumulativeReward;
public void TrainObservation()
{
    if (initial)
    {
		initial = false;
		episodeCounter++;
		// Initialize the environment and get its state
		myEnvironment.Reset();
		state = DeepCopy(myEnvironment.GetCurrentState());
		cumulativeReward = 0;
		transitionsInEpisode = new List<Transition<T>>();
	}
    
	// Select an action based on the policy
	(int[], float[]) action = SelectAction(state);
	// Take a step using the selected action
	float reward = myEnvironment.Step(action.Item1, action.Item2);
	// Check if the episode is done
	var done = myEnvironment.isDone;

	T nextState;
	if (done)
	{
		// If done, there is no next state
		nextState = default;
		initial = true;
	}
	else
	{
		// If not done, get the next state
		nextState = DeepCopy(myEnvironment.GetCurrentState());
	}

	if (state == null)
		Console.WriteLine("state is null");

	// Store the transition in temporary memory
	transitionsInEpisode.Add(new Transition<T>(state, action.Item1, action.Item2, reward, nextState));

	cumulativeReward += reward;
	// If not done, move to the next state
	if (!done)
	{
		state = nextState;
	}
	else
	{
		foreach (var item in transitionsInEpisode)
		{
			myReplayBuffer.Push(item);
		}

		OptimizeModel();

		//TODO: hardcoded chart
		episodeRewards.Add(cumulativeReward);

		if (myOptions.DisplayPlot != null)
		{
			myOptions.DisplayPlot.CreateOrUpdateChart(episodeRewards);
		}
	}
}

DQNAgent:

T state;
bool initial = true;
float cumulativeReward;
public void TrainObservation()
{
    if (initial)
    {
		initial = false;
		episodeCounter++;
        // Initialize the environment and get its state
        myEnvironment.Reset();
        state = DeepCopy(myEnvironment.GetCurrentState());
        cumulativeReward = 0;
    }

	// Select an action based on the policy
	var action = SelectAction(state);
	// Take a step using the selected action
	var reward = myEnvironment.Step(action);
	// Check if the episode is done
	var done = myEnvironment.isDone;


	T nextState;
	if (done)
	{
		// If done, there is no next state
		nextState = default;
        initial = true;
	}
	else
	{
		// If not done, get the next state
		nextState = DeepCopy(myEnvironment.GetCurrentState());
	}

	if (state == null)
		Console.WriteLine("state is null");

	// Store the transition in temporary memory
	myReplayBuffer.Push(new Transition<T>(state, action, null, reward, nextState));



	cumulativeReward += reward;
	// If not done, move to the next state
	if (!done)
	{
		state = nextState;
	}
	// Perform one step of the optimization (on the policy network)
	OptimizeModel();

	// Soft update of the target network's weights
	// θ′ ← τ θ + (1 −τ )θ′
	SoftUpdateTargetNetwork();


	if (done)
	{
		episodeRewards.Add(cumulativeReward);
		if (myOptions.DisplayPlot != null)
		{
			myOptions.DisplayPlot.CreateOrUpdateChart(episodeRewards);
		}
	}
}

It all runs, but the results are random. First of all, running on CPU uses only 2 threads, and GPU is even slower. DQN is too slow. On average a test may have 30K events on 200K observations. Reloading previously trained agent and running training on the same data produces random results. No improvement at all. PPO tends to quit very early and I tried different parameters. In general, both DQN and PPO do not respond to feedback. PPO gets stuck trying the same action.

Thank you

from rl_matrix.

asieradzk avatar asieradzk commented on May 27, 2024

Okay so I've updated the nuget packages and github repository to the newest version of RLMatrix

This time is like you required - one step of the environment at the time, better yet we can step any number of environments simultaneously. I've updated examples so you can have a look at the code there. It doesn't change much.

var envppo = new List<IEnvironment<float[]>> { new CartPole(), new CartPole() };
var myAgentppo = new PPOAgent<float[]>(optsppo, envppo);

for (int i = 0; i < 10000; i++)
{
    myAgentppo.Step();
}

Let me know if this works for you :)

On another note its great you are trying to use deep reinforcement learning for stock trading. I know many academics are working on this difficult task and my adventure with deep learning also started with trying to use it for crypto trading. Keep in mind this is going to be daunting task, I would suggest you have a look first at examples where reinforcement learning was used successfully to win at poker.

https://www.science.org/doi/10.1126/science.aay2400

from rl_matrix.

alpha-wavelet avatar alpha-wavelet commented on May 27, 2024

Thank you for the update and the article.

I already have a GBM (on top of other tools) to forecast market at over 80% accuracy. The RL layer on top is to make trades, a task for which it is more suitable.

from rl_matrix.

asieradzk avatar asieradzk commented on May 27, 2024

Thank you for the update and the article.

I already have a GBM (on top of other tools) to forecast market at over 80% accuracy. The RL layer on top is to make trades, a task for which it is more suitable.

In that case sounds like a good use case. Hope it works out for :)
I will close the issue now but feel free to contact me or open new one anytime when you need help setting something up with RL Matrix I am happy to help.

from rl_matrix.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.