pythonlessons / rl-bitcoin-trading-bot Goto Github PK

Trying to create Reinforcement Learning powered Bitcoin trading bot

License: MIT License

Python 100.00%

rl-bitcoin-trading-bot's Introduction

Working on new version

Updated 2023-11-24

This is an old project, at the time when I was creating this, it was more for educational reasons. Now, I am creating a more advanced environment for financial trades. The goal is to test with real trades 24/7, we'll see where we'll get!

Link to new project: https://github.com/pythonlessons/FinRock

Reinforcement Learning Bitcoin Trading Bot

Right now I am planning to create 7 tutorials, we'll see where we can get with them (DONE)

Trying to create Reinforcement Learning powered Bitcoin trading bot

rl-bitcoin-trading-bot's People

Contributors

Stargazers

Watchers

Forkers

rterbush cotrader cjsgarbi carador josh-maree profshen rjbashar alfrentgen 0xdarkman jessexxl jqmarsh lukyanoff jonnhyaxe webprogrammer77 phoque6 jacobfastpro sramboer neogeoxl kanyimatrix eidsubaie cubit9-danielpeyer armk41 vizzito adelmahmoudhussein mephessivolc blackdoc3 singularitypostman furkanyildirimm reed-schimmel zhangjielun1994 pawansansanwal hoangdh5 taylour12 unique4 hoaxparagon lequanghuyuit hjabbour crunzex divyahebballi rushgz kshivam654 nashmakh noah1818 grvnmttl kevincdurand1 gaodongyuan jordansaphir francis-mujani jsi4097 pusadi marcedev250 davidthinh dascient kp9developer joaobrasil65 sam-nejad salomonmusare bunthit vantm anunknownerror t3ch9 miouzikal art-levy weihao94 winnersky edsonbessa fdoperezi laondev12 evilanio hanisahrashid tashinahmed verticz4711 gels00n bkomaki nikcz mi-molette dyna-bytes bidhyapokharel machine-learning-labs shaqichen son1128 lemoritzxd gitouyou lovehedy24 muhammad-ammar-tanweer rapmd73 condor333 mikenew01 fghcdp mike9304 khalilelmaghraoui decoderkurt isbainemohamed n0b0dd behdadahmadi saxiswim adai2020 hogwild gavinjin0501 lawerge

rl-bitcoin-trading-bot's Issues

A Horrible mistake!

I noticed that you are reseting the env at the beginning of each episode!

By doing this you never go forward in your dataset!

check train_agent function.

Thats why you train your model for 50000 episodes and nothing goes wrong!! (all your database is smaller than 50000. it is 23450)

Indicators

I have one question. Shouldn't the indicators be for more days? I mean if you get 1h candles and the RSI is calculated with 14 days, shouldn't it be 24 x 14 = 336 margin rows? Because I understand that the library takes the days as rows of the data frame, as if the rows were days.

get_reward() returns None, later results in exception

    # Calculate reward
    def get_reward(self):
        if self.episode_orders > 1 and self.episode_orders > self.prev_episode_orders:
            self.prev_episode_orders = self.episode_orders
            if self.trades[-1]['type'] == "buy" and self.trades[-2]['type'] == "sell":
                reward = self.trades[-2]['total']*self.trades[-2]['current_price'] - self.trades[-2]['total']*self.trades[-1]['current_price']
                self.trades[-1]["Reward"] = reward
                return reward
            elif self.trades[-1]['type'] == "sell" and self.trades[-2]['type'] == "buy":
                reward = self.trades[-1]['total']*self.trades[-1]['current_price'] - self.trades[-2]['total']*self.trades[-2]['current_price']
                self.trades[-1]["Reward"] = reward
                return reward
             # return needed

        else:
            return 0

This code returns None sometimes and it doesn't seem to have any rhyme or reason to do so. I noticed there's no else in the nested conditional to return anything and I believe this is the source of the error as my commented code would suggest above.

Binance implementation

Hello,

I played around with your code. Fantastic job really... Thanks for sharing.
The code and trained agent can beat not only historical data that is trained on but also a data that agent never saw.
I made some tests on bull, bear and sideway markets. Models it learned can bring a profit of 10-20% in one week (I am looking into 5 minutes candle sticks and testing on the 1 week period, around 2000 points). After I got satisfying results I decided to write some lines so that I can implement this in real life with a small money.

But I am a little unsure about the approach...

What I am thinking now is the following:

connect binance web socket to get 5 minute klines
once the candle is closed and if the 1 week historical data exists - if not, download 1 week historical data- append the closed candle into the historical data.
make a test based on the last weeks data
get the last action on the last candle (0, 1, 2: Hold, Buy, Sell) provided that net_worth of the episode is more than 10% and implement that action in binance.

As I said I am a little unsure if this is the correct approach.

Any thoughts?

PS. I can not, by no means, claim that I am a coder. I can read, understand and alter but the things I try are probably are far from efficiency :)

Anyways... Hope you guys have some thoughts so that we can put this fantastic code one step further...

'Adam' object has no attribute 'get_updates'

C:\Users\G15\Desktop\tradingbotRL\env\lib\site-packages\keras\src\engine\training_v1.py:2359: UserWarning: Model.state_updates will be removed in a future version. This property should not be used in TensorFlow 2.0, as updates are applied automatically.
updates=self.state_updates,
2023-11-30 19:38:24.841936: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE SSE2 SSE3 SSE4.1 SSE4.2 AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-30 19:38:24.863100: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2023-11-30 19:38:24.898453: W tensorflow/c/c_api.cc:305] Operation '{name:'dense_15/kernel/Assign' id:566 op device:{requested: '', assigned: ''} def:{{{node dense_15/kernel/Assign}} = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](dense_15/kernel, dense_15/kernel/Initializer/stateless_random_uniform)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
2023-11-30 19:38:25.816994: W tensorflow/c/c_api.cc:305] Operation '{name:'dense_7/BiasAdd' id:272 op device:{requested: '', assigned: ''} def:{{{node dense_7/BiasAdd}} = BiasAdd[T=DT_FLOAT, _has_manual_control_dependencies=true, data_format="NHWC"](dense_7/MatMul, dense_7/BiasAdd/ReadVariableOp)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
Traceback (most recent call last):
File "C:\Users\G15\Desktop\tradingbotRL\bot.py", line 313, in
train_agent(train_env, visualize=False, train_episodes=20000, training_batch_size=500)
File "C:\Users\G15\Desktop\tradingbotRL\bot.py", line 273, in train_agent
env.replay(states, actions, rewards, predictions, dones, next_states)
File "C:\Users\G15\Desktop\tradingbotRL\bot.py", line 213, in replay
a_loss = self.Actor.Actor.fit(states, y_true, epochs=self.epochs, verbose=0, shuffle=True)
File "C:\Users\G15\Desktop\tradingbotRL\env\lib\site-packages\keras\src\engine\training_v1.py", line 856, in fit
return func.fit(
File "C:\Users\G15\Desktop\tradingbotRL\env\lib\site-packages\keras\src\engine\training_arrays_v1.py", line 734, in fit
return fit_loop(
File "C:\Users\G15\Desktop\tradingbotRL\env\lib\site-packages\keras\src\engine\training_arrays_v1.py", line 192, in model_iteration
f = _make_execution_function(model, mode)
File "C:\Users\G15\Desktop\tradingbotRL\env\lib\site-packages\keras\src\engine\training_arrays_v1.py", line 620, in _make_execution_function
return model._make_execution_function(mode)
File "C:\Users\G15\Desktop\tradingbotRL\env\lib\site-packages\keras\src\engine\training_v1.py", line 2366, in _make_execution_function
self._make_train_function()
File "C:\Users\G15\Desktop\tradingbotRL\env\lib\site-packages\keras\src\engine\training_v1.py", line 2284, in _make_train_function
updates = self.optimizer.get_updates(
AttributeError: 'Adam' object has no attribute 'get_updates'

problem with train_agent and self.get_gaes

deltas = [r + gamma * (1 - d) * nv - v for r, d, nv, v in zip(rewards, dones, next_values, values)]

TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'

Nomalizing data

Here's the full traceback call I am getting in nomalizing the data; I made the column_list print.
['Unnamed: 0', 'Date', 'Open', 'Close', 'High', 'Low', 'Volume', 'sma7', 'sma25', 'sma99', 'bb_bbm', 'bb_bbh', 'bb_bbl', 'psar', 'MACD', 'RSI']
Traceback (most recent call last):
File "D:\RL-Bitcoin-trading-bot_7\tazz\lib\site-packages\pandas\core\ops\array_ops.py", line 143, in na_arithmetic_op
result = expressions.evaluate(op, left, right)
File "D:\RL-Bitcoin-trading-bot_7\tazz\lib\site-packages\pandas\core\computation\expressions.py", line 233, in evaluate
return _evaluate(op, op_str, a, b) # type: ignore
File "D:\RL-Bitcoin-trading-bot_7\tazz\lib\site-packages\pandas\core\computation\expressions.py", line 68, in _evaluate_standard
return op(a, b)
TypeError: unsupported operand type(s) for -: 'str' and 'float'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "RL-Bitcoin-trading-bot_7.py", line 563, in
df_nomalized = Normalizing(df[99:])[1:].dropna()
File "D:\RL-Bitcoin-trading-bot_7\utils.py", line 290, in Normalizing
df[column] = df[column] - df[column].shift(1)
File "D:\RL-Bitcoin-trading-bot_7\tazz\lib\site-packages\pandas\core\ops\common.py", line 65, in new_method
return method(self, other)
File "D:\RL-Bitcoin-trading-bot_7\tazz\lib\site-packages\pandas\core\ops_init_.py", line 343, in wrapper
result = arithmetic_op(lvalues, rvalues, op)
File "D:\RL-Bitcoin-trading-bot_7\tazz\lib\site-packages\pandas\core\ops\array_ops.py", line 190, in arithmetic_op
res_values = na_arithmetic_op(lvalues, rvalues, op)
File "D:\RL-Bitcoin-trading-bot_7\tazz\lib\site-packages\pandas\core\ops\array_ops.py", line 150, in na_arithmetic_op
result = masked_arith_op(left, right, op)
File "D:\RL-Bitcoin-trading-bot_7\tazz\lib\site-packages\pandas\core\ops\array_ops.py", line 92, in masked_arith_op
result[mask] = op(xrav[mask], yrav[mask])
TypeError: unsupported operand type(s) for -: 'str' and 'str'

Any sort of help would be appreciated.
Thanks.

Errors in deduction of fees - fees are actually not applied at all

You are basically leaving fees to yourself. But they should go to the exchange provider.

Line ~~self.balance -= self.crypto_bought * current_price~~ should be self.balance = 0 as is written in comment "buy with 100% of current balance" - this means balance should become zero after that (in your code you basically subtract from balance your balance reduced by fee so after this your balance contains fee which should be instead given to the exchange provider):

RL-Bitcoin-trading-bot/RL-Bitcoin-trading-bot_7/RL-Bitcoin-trading-bot_7.py

Lines 265 to 268 in a01c8c8

    
           # Buy with 100% of current balance 
        
           self.crypto_bought = self.balance / current_price 
        
           self.crypto_bought *= (1-self.fees) # substract fees 
        
           self.balance -= self.crypto_bought * current_price

Line ~~self.crypto_held -= self.crypto_sold~~ should be self.crypto_held = 0 as you are selling here all your coins (instead you are just subtracting amount of coins reduced by fee so in your purse is left this "fee" amount of coins which should go to the exchange provider).

RL-Bitcoin-trading-bot/RL-Bitcoin-trading-bot_7/RL-Bitcoin-trading-bot_7.py

Lines 274 to 278 in a01c8c8

    
           # Sell 100% of current crypto held 
        
           self.crypto_sold = self.crypto_held 
        
           self.crypto_sold *= (1-self.fees) # substract fees 
        
           self.balance += self.crypto_sold * current_price 
        
           self.crypto_held -= self.crypto_sold

Happy Ways

Hi, i‘m watching your work here now from time to time, and i‘m happy to see, someone try to build a tensor trading bot. i‘m not familiar with tensor, but i‘remember my study time, we was build a technical Kreatur what should walk forward, we was give only this job, to the NN, and later a lot epoche, it was move forward ... in a way where we was smile.

so, maybe the same can happen here, the bot can teach himself for found the best strategy to to learn how to get food (for example btc)

i‘dont know, this is possible ... but it‘s how i’thing, should work a AI. learn to get food in a live Szenario ...

Lookahead Bias

RL-Bitcoin-trading-bot/RL-Bitcoin-trading-bot_4/RL-Bitcoin-trading-bot_4.py

Line 229 in 10dac5e

current_price = self.df.loc[self.current_step, 'Open']

Hi Rokas,

Thank you for the lesson and fantastic work on this tutorial. I have a quick question about why you set the current price equal to the 'open' price rather than the Close. The agent is able to see the close price for any time step, but is still able to execute at the open price. Doesn't this provide information that a real trader would not have? In other words, isn't the agent able to see that, for example, the close is higher than the open, and therefore should buy? I think this introduces some form of lookahead bias. Do you know what your results look like if you set the current price to the close?

Thanks

The machine becomes worse, instead of improving

Hey Rokas!
Thanks for your code. I am playing with it for a while. What I see, is that the machine like does not learn. It starts with the results that are better than the results it achieves after a while. In the meantime, it acts alternatively, like chasing a sinusoid pattern.
Do you have any idea?

Utils - Unpacking the deque Render_data list without using a specific index

Hello,

I am a bit confused about how your TradingGrpah.Render function deals with deque list Render_data.

Your code is the following:

self.render_data.append([Date, Open, High, Low, Close])

# Clear the frame rendered last step
self.ax1.clear()
candlestick_ohlc(self.ax1, **self.render_data**, width=0.8/24, colorup='green', colordown='red', alpha=0.8)

for this to work on my end I have to reference an index of the deque list like so:

self.render_data.append([Date, Open, High, Low, Close])
        
# Clear the frame rendered last step
self.ax1.clear()
candlestick_ohlc(self.ax1, **self.render_data[0]**, width=0.8/24, colorup='green', colordown='red', alpha=0.8)

If I don't do this and use your method I get the following error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

or this if I pass my data in the same way using df.loc[current_step]

ValueError: not enough values to unpack (expected 5, got 0)

Any suggestions?

multiprocessing is slow and doesn't use much GPU

Is there any way to speed up training and get the program to use more GPU?

Might be Uncomplete

Reward calculation is not include commision fee.
Commision in net assets calculation is also not being calculated. This method, the fee has been add and reduce,finally get to be zero.
It seems not allow the short selling. It's not easy to train in down trend.

Gaes fees

Hello, I was trying to work this out on my end from scratch, I have got it to the point of training the model and also visualize but it seems to drop in the middle of the training session without saving the model.

VC:
Python : 3.8.10
tensorflow = 2.3.1
Windows = 11
No IDLE, Using script mode from windows power shell virtual env.

Below is the complete Traceback of the error I received.

2022-03-07 04:17:43.095316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-03-07 04:17:43.100610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]
Traceback (most recent call last):
File "RL-Bitcoin-trading-bot_7.py", line 501, in
train_multiprocessing(CustomEnv, agent, train_df, train_df_nomalized, num_worker = 5, training_batch_size=50, visualize=True, EPISODES=5)
File "D:\Mine\RLCurrent\multiprocessing_env.py", line 95, in train_multiprocessing
a_loss, c_loss = agent.replay(states[worker_id], actions[worker_id], rewards[worker_id], predictions[worker_id], dones[worker_id], next_states[worker_id])
File "RL-Bitcoin-trading-bot_7.py", line 121, in replay
advantages, target = self.get_gaes(rewards, dones, np.squeeze(values), np.squeeze(next_values))
File "RL-Bitcoin-trading-bot_7.py", line 93, in get_gaes
deltas = [r + gamma * (1 - d) * nv - v for r, d, nv, v in zip(rewards, dones, next_values, values)]
File "RL-Bitcoin-trading-bot_7.py", line 93, in
deltas = [r + gamma * (1 - d) * nv - v for r, d, nv, v in zip(rewards, dones, next_values, values)]
TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'

Any sort of help is highly appreciated. If needed I'll post code snippets as well for more clarity.
Thanks.

np.random.choice(self.action_space, p=prediction)

Hi, I want to ask about this:
Why you use np.random.choice(self.action_space, p=prediction) but not np.argmax()??

Memory leak

Hello,

First, thank you so much for your source code.

But there is a memory leak issue when training,

The source stop at almost 2000th episode.

My PC has 32G RAM.

Below is the last part of messages

...
episode: 2074 [000020] worker: 0 net worth: 1035202.66 average: 1177863.37 orders: 38
episode: 2075 [000020] worker: 5 net worth: 1458406.87 average: 1163223.70 orders: 10
episode: 2076 [000020] worker: 5 net worth: 467591.76 average: 1156877.23 orders: 1
unable to alloc 3840000 bytes

Not clear if it is learning

Hello!

I was reviewing your code base, considered using it as part of a demo for a class I teach. The initial run didn't seem to be learning much. I went into the function AddIndicators and added 2 new indicators:

# Add a magic indicator that will tell you tomorrow's return
df['magic'] = df['Close'].pct_change().shift(-1)
df['magic8'] = df['Close'].pct_change(8).shift(-8)

So, the idea here is that I will tell the bot what the return will be in 1 hour and in 8 hours. With this information, a human trader could make a huge return. I've done this same test on 3 other code bases, and only 1 could actually learn this.

After implementing this change and re-running the bot, for thousands of episodes, it does not seem to have learned much. The average return and episodic return aren't zooming up like I would expect.

episode: 26340 worker: 21 net worth: 943.79 average: 1046.29 orders: 2
episode: 26341 worker: 14 net worth: 846.84 average: 1044.38 orders: 6
episode: 26342 worker: 26 net worth: 1019.90 average: 1045.11 orders: 34
episode: 26343 worker: 16 net worth: 1661.54 average: 1051.10 orders: 92
episode: 26344 worker: 20 net worth: 1020.38 average: 1051.17 orders: 49
episode: 26345 worker: 24 net worth: 989.14 average: 1052.00 orders: 3
episode: 26346 worker: 19 net worth: 990.24 average: 1052.65 orders: 4

This is very similar to the first few episodes, except generally the number of orders has declined.

This might be due to the convolution layer 'blurring out' or averaging away the ability of the bot to notice that one of its features is very helpful.

Expected behavior: Return should get much higher when the bot is provided with perfect information from the future.
Actual behavior: Doesn't seem to change anything.

Thank you!

Mistake

Hi,

Big mistake right here :
https://github.com/pythonlessons/RL-Bitcoin-trading-bot/blob/main/RL-Bitcoin-trading-bot_7/RL-Bitcoin-trading-bot_7.py#L309

Basically you set your reward as the difference in value of the trades. Which is fine.

The value of a trade is amount of the thing multplied by the price of the thing. Fine.

However, in this particular line, you have made a mistake (by copy pasting your code I believe, it happens).
Because you set the amount of the buy trade the same as the amount of the previous sell trade (= bypassing the fees).

It is only one character at line 309 ... and it dramastically changes your results and the way the model learns.

Still, I would like to thank you for providing your lessons and code online. Sincerely.
I am doing other personal things using pytorch and you're inspiring a lot.
It is a great experience for me to try and translate your code and other youtubers work into pytorch.
You have no idea how much I learned faster thanks to you.
I like a lot your blog posts, mathematical explanations.

Keep it up.

critic_PPO2_loss

Hi, thx for lesson
...
thx!

buy and sell placed on same day when visualizing

not sure if it's a graphical error or not but the red and green buy/sell markers end up being placed on the same candlestick for me. I've been through the code and it doesn't seem like that should be possible.

Problem with visualizing img with OpenCV in utils.py

I run default script in the Windows 10 with:
test_multiprocessing(CustomEnv, CustomAgent, test_df, test_df_nomalized, num_worker = 16, visualize=True, (...)
im main.py.

When it comes to utils.py to line:
img = img.reshape(self.fig.canvas.get_width_height()[::-1] + (3,))
I have error:

Traceback (most recent call last):
  File "C:\Users\Tomek\anaconda3\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
  File "c:\Users\Tomek\Documents\binance_bot\RL-Bitcoin-trading-bot-main_my_version\multiprocessing_env.py", line 35, in run
    self.env.render(self.visualize)
  File "c:\Users\Tomek\Documents\binance_bot\RL-Bitcoin-trading-bot-main_my_version\RL-Bitcoin-trading-bot_7.py", line 324, in render
    img = self.visualization.render(self.df.loc[self.current_step], self.net_worth, self.trades)
  File "c:\Users\Tomek\Documents\binance_bot\RL-Bitcoin-trading-bot-main_my_version\utils.py", line 231, in render
    img  = img.reshape(self.fig.canvas.get_width_height()[::-1] + (3,))
ValueError: cannot reshape array of size 15360000 into shape (800,1600,3)

Prints are following:

print("2", img, "len", len(img))
print("3", self.fig.canvas.get_width_height())
print("4", self.fig.canvas.get_width_height()[::-1])

2 [255 255 255 ... 255 255 255] len 15360000
3 (1600, 800)
4 (800, 1600)

What should I do? figsize is:

# figsize attribute allows us to specify the width and height of a figure in unit inches
fig = plt.figure(figsize=(16,8))

Random Choice when doing prediction

I am not sure why you you did the random prediction this way action = np.random.choice(self.action_space, p=prediction) also you are picking the random outcome as the chosen action during testing and training. I can understand why during training but why during testing as well ?

Suggestion: Realistic order prices/fees

Hi!

I read your articles on pylessons, thanks a lot! I have a suggestion to make since I feel like you are missing two very important parts of the "trading environment": Bid-Ask-Spread and exchange fees. Especially the first one will MASSIVELY impact how your algorithm performs in real life. As far as I get from your code, you are currently determining the "current price" by just selecting the Open price of the current time window.

I don't know how familiar you are with crypto-trading or trading in general - I am myself a beginner at this, so don't take everything here to be correct - but when you want to buy or sell something, you have to find someone who is willing to buy or willing to sell at the price you suggest. In trading, this usually done by your broker (i.e. exchange in crypto) via the orderbook. Orders (bids and asks) with a target price for buying ore selling are placed there and if there is a matching price a trade is made. However, the difference between the current stock/coin price and the bids and asks can be very big and differs from exchange to exchange and throughout time dramatically. The difference between the best bid and ask price in the orderbook is called bid-ask spread. For bitcoin you can for example look at the previous bid ask spread here: https://data.bitcoinity.org/markets/spread/2y/USD?c=e&st=log&t=l - however, for smaller coins the spread is usually much higher since the market isn't that big. For MOB/USD on the FTX exchange for example bid/ask spread is currently 0.8%. That is HUGE. It means, every time you buy or sell you will lose 0.8% of your money.

So, what happens in reality when you want to buy/sell something?

You usually have a couple of choices to place an order, but the two most basic choices are:

"Market order" which guarantees a sale by matching the bid price in the orderbook. For selling this means, it will be sold for the highest bid price in the order book, for buying it means it will be bought for the lowest ask price. It is usually guaranteed to be fulfilled but you don't have precise control over the price: If the price changes while you place the order and the order being fulfilled, you may loose or even gain a bit. I think the easiest way to model this is to take a random value between +-1% of the bid/ask price. Also if the highest bid or ask in the orderbook does not cover your whole order, the lower/higher ones will be taken, which is bad for you.
"Limit order" where you can set a price yourself at which you want to sell. This means you do have control over the price but it's MUCH harder to get an immediate sale like that. The exact trading procedure is a science for itself. If you place your order at the current highest bid/lowest ask price in the orderbook, it usually will be fulfilled quickly, but there is no guarantee on that: If the price rises or falls, nobody will probably want your order.

Now on top of that, you have trading fees. They are exchange dependend, and there are two different ones:

"Maker fees": A percentage you have to pay when you put an order in the orderbook and someone elses order gets matched against it. They are usually relatively low and on some exchange can even be negative! That means you even get money for trading. However, that means, that your order must not be immediately be matched against some others order in the orderbook. Maker fees for FTX for example are 0.025%, on other exchange like kraken they are higher - 0.16% and 0.1% on Binance. This means: No instant trade! So for our two order types, this means, they don't apply for the market order and for the limit order only if your order does not get matched immediately against an order in the orderbook. For that we have the
"Taker fees": A percentage you have to pay when you put an order in the orderbook and it gets immediately matched against someone elses order (you put a bid there and it gets immediately matched against an ask in the orderbook). This is true for market orders --> market orders always have the taker fee. They are usually higher than the maker fees - on FTX 0.075%, on kraken 0.26% on Binance they are currently 0.1% too.

There are tricks to get maker fees with limit orders too (the POST option) but they usually don't guarantee an immediate sale either.

So what does all that mean?

The performance of your algorithm will be heavily impacted by the bid-ask spread of the market it tries to trade. The bid/ask fees are more or less static, but bid-ask spread is a problem. For comparison I implemented some simple hand written bots and traded on a sample of the DOGE/USD coin with it - one time I assumed 0.1% cost for a trade and one time 1% cost. It literally meant the difference between a 457.68% increase with 0.1% assumed trading costs to 0.00003% left of the original 100% with 1% cost. Because the bots were trading far too often to make the 1% costs of the trade feasible.

So bid-ask spread and proper ordering should be part of your model to make it realistic and something your agents should be able to observe and learn from.

preventing model from updating parameters while testing

How would I prevent the model from updating W/Bs while doing testing? I'd like to get one model with better generalization and I think I can do that if it stops updating every nth batch size while testing.

Why Random in Action Selection?

Hi,
I just finished your tutorial, and it's really interesting,
I'm just wondering why you are using np.random.choice() to select your action from the prediction, is it not better to get the maximum value instead of using random?

predictions_list = agent.Actor.actor_predict(np.reshape(state, [num_worker]+[_ for _ in state[0].shape]))
actions_list = [np.random.choice(agent.action_space, p=i) for i in predictions_list]

	# Buy with 100% of current balance
	self.crypto_bought = self.balance / current_price
	self.crypto_bought *= (1-self.fees) # substract fees
	self.balance -= self.crypto_bought * current_price

	# Sell 100% of current crypto held
	self.crypto_sold = self.crypto_held
	self.crypto_sold *= (1-self.fees) # substract fees
	self.balance += self.crypto_sold * current_price
	self.crypto_held -= self.crypto_sold

pythonlessons / rl-bitcoin-trading-bot Goto Github PK

rl-bitcoin-trading-bot's Introduction

Working on new version

Updated 2023-11-24

Reinforcement Learning Bitcoin Trading Bot

rl-bitcoin-trading-bot's People

Contributors

Stargazers

Watchers

Forkers

rl-bitcoin-trading-bot's Issues

So, what happens in reality when you want to buy/sell something?

So what does all that mean?

Recommend Projects

Recommend Topics

Recommend Org