dorienh / jesse Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 0.0 988.65 MB

Python 76.24% Jupyter Notebook 23.76%

jesse's People

Contributors

Stargazers

Watchers

jesse's Issues

Hop size when reading data

I have a suspicion that the hop size > 1 in the data preloading.

Can we check that hope size = 1?

Examining the X_train tensors it does not seem like they are hopped. But maybe I am missing something and it's randomized.

Which there were comments in this class!


# 

[[0.4495, 0.3710, 0.4673,  ..., 0.0000, 0.0000, 1.0000],
        [0.5688, 0.4324, 0.5621,  ..., 1.0000, 0.2884, 0.0000],
        [0.9572, 0.7592, 1.0000,  ..., 1.0000, 0.7755, 1.0000],
        ...,
        [0.9235, 0.7494, 0.9281,  ..., 1.0000, 0.7755, 1.0000],
        [0.8869, 0.7838, 0.9771,  ..., 1.0000, 0.7755, 1.0000],
        [0.6086, 0.4865, 0.4281,  ..., 1.0000, 0.6783, 0.0000]],
       dtype=torch.float64)


[[0.4740, 0.4275, 0.5392,  ..., 1.0000, 0.3670, 0.0000],
        [0.6177, 0.4693, 0.5817,  ..., 1.0000, 0.6783, 0.0000],
        [0.5199, 0.4177, 0.5163,  ..., 1.0000, 0.2884, 0.0000],
        ...,
        [0.1835, 0.1400, 0.1569,  ..., 0.0000, 0.0000, 0.3862],
        [0.3425, 0.2776, 0.1830,  ..., 0.0000, 0.0000, 0.0000],
        [0.5719, 0.5283, 0.6536,  ..., 1.0000, 0.6783, 0.0000]],
       dtype=torch.float64)

Input - not across files

When the data loader samples x and y inputs to feed the network, x shoudl not go across files,

So I'm talking about x with length x_days_ago and 1 y. This is the network input (x batch_size of course).

This should be hop size 1, so overlapping windows. But X should always belong to 1 file.

I'm afraid the

np.row_stack()

does not respect this.

Predict - Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

Predict function still has the loss function error I believe.

----> 3 model.predict(testdata=process.X_test, target=process.y_test)
      4 
      5 

7 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight, bias)
    258                             _single(0), self.dilation, self.groups)
    259         return F.conv1d(input, weight, bias, self.stride,
--> 260                         self.padding, self.dilation, self.groups)
    261 
    262     def forward(self, input: Tensor) -> Tensor:

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

Lstm still has issue on GPU (wavenet is ok except prediction)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-13-20bc00643d0f> in <module>()
      5     optimizer=optimizer,
      6     criterion=criterion,
----> 7     checkpointdir=checkpointdir,
      8 )

4 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
    660         if batch_sizes is None:
    661             result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
--> 662                               self.dropout, self.training, self.bidirectional, self.batch_first)
    663         else:
    664             result = _VF.lstm(input, batch_sizes, hx, self._flat_weights, self.bias,

RuntimeError: Input and hidden tensors are not at the same device, found input tensor at cuda:0 and hidden tensor at cpu

Clarifying - test / validation

Just a little question to clarify validation and test set. Is validation set used? Or is the loss shown during training 'validation loss' in the log, the test set (e/g. it's not used during training, just to keep track). The latter is fine, no validation needed if not used, if it is, all the better :)

RuntimeError: CUDA error: device-side assert triggered

On GPU with BCELoss (and focalloss):

criterion = torch.nn.BCELoss()

Training With 192 Features
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-20bc00643d0f> in <module>()
      5     optimizer=optimizer,
      6     criterion=criterion,
----> 7     checkpointdir=checkpointdir,
      8 )

6 frames
/content/drive/My Drive/herremans_data/src/wavenetmodel.py in train(self, epochs, dataloader, optimizer, criterion, checkpointdir)
     75     ):
     76         valid_loss_min = np.Inf
---> 77         self.model.to(self.device)
     78         self.model.float()
     79         for epoch in range(1, epochs + 1):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in to(self, *args, **kwargs)
    671             return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
    672 
--> 673         return self._apply(convert)
    674 
    675     def register_backward_hook(

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
    385     def _apply(self, fn):
    386         for module in self.children():
--> 387             module._apply(fn)
    388 
    389         def compute_should_use_set_data(tensor, tensor_applied):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
    385     def _apply(self, fn):
    386         for module in self.children():
--> 387             module._apply(fn)
    388 
    389         def compute_should_use_set_data(tensor, tensor_applied):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
    385     def _apply(self, fn):
    386         for module in self.children():
--> 387             module._apply(fn)
    388 
    389         def compute_should_use_set_data(tensor, tensor_applied):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
    407                 # `with torch.no_grad():`
    408                 with torch.no_grad():
--> 409                     param_applied = fn(param)
    410                 should_use_set_data = compute_should_use_set_data(param, param_applied)
    411                 if should_use_set_data:

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in convert(t)
    669                 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
    670                             non_blocking, memory_format=convert_to_format)
--> 671             return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
    672 
    673         return self._apply(convert)

RuntimeError: CUDA error: device-side assert triggered

Lstm not working on cuda

See error here: https://colab.research.google.com/drive/1gLQ3O1x12fa_vAJyMG6NUQBXQeRGw6FS?usp=sharing

Setting last_x_days=40

Three things here:

Allow this to be set in the main engine.
Allow different values, e.g. if I change to 40 I get an error. You should allow big values. May need to dismiss input files that are too short.
After changing to 30 at the top of dataprocess.py, I still get:

X_train Shape (8602, 14, 192), Y_train Shape (8602, 1)
X_test Shape (2464, 14, 192), Y_test Shape(2464, 1)

Disk usage exploding during training

I think some objects need to be destroyed at the end of a training loop, because when training my disk space keeps using 50-100GB!

Data should be normalized per file, not per crypto name (as these occur multiple times, multi-exchange data)

#todo this can create issues, because actually pairs occur multiple times but should not be normalized the same! Just normalize for each file, not per crypto pair
        self.cryptos = [os.path.split(file)[1].split("_")[1] for file in files]

timeseries across files - convert to timeseries?

What does the function

_convert_to_timeseries()

do?

I feel like here a mistake is being made to do the input 'windows' across files.

Predict function for csv file

I believe you were also working on a predict function to add a column with prediction probabilities to a dataframe from 1 csv file:

jesse/dataset/production_data_for_new_prediction/production.csv

function input = dataframe
function output = column with prediction values (probabilities is fine)

(Do also keep the prediction on the 10% test set like is now.)

Input processing - not enough rows

Something seems off with input processing.

X_train gives (on hourly_demo data), gives me a dataframe of 16k rows, like such:

However, looking at the X_file, the second file alone has 27k lines, so the X-train is way too small

In addition, what is the strange index column?

CPU - RuntimeError: all elements of input should be between 0 and 1

Running on CPU, hourly data, BCELoss:

Note this is changed:

cols_to_pred = ["Buy_p40_a1"]

datadir = "../dataset/daily/"

Gives the error:

0%|          | 0/44 [00:00<?, ?it/s]Training With 192 Features
100%|██████████| 44/44 [02:40<00:00,  3.66s/it]
14%|█▍        | 1/7 [00:02<00:17,  2.84s/it]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-14-20bc00643d0f> in <module>()
    5     optimizer=optimizer,
    6     criterion=criterion,
----> 7     checkpointdir=checkpointdir,
    8 )

3 frames
/content/drive/My Drive/herremans_data/src/wavenetmodel.py in train(self, epochs, dataloader, optimizer, criterion, checkpointdir)
  109                 target = target.float()
  110                 output = self.model(data)  # .double())
--> 111                 loss = criterion(output, target.float())
  112                 valid_loss += (1 / (batch_idx + 1)) * (loss.data - valid_loss)
  113 

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
  887             result = self._slow_forward(*input, **kwargs)
  888         else:
--> 889             result = self.forward(*input, **kwargs)
  890         for hook in itertools.chain(
  891                 _global_forward_hooks.values(),

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/loss.py in forward(self, input, target)
  611     def forward(self, input: Tensor, target: Tensor) -> Tensor:
  612         assert self.weight is None or isinstance(self.weight, Tensor)
--> 613         return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
  614 
  615 

/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in binary_cross_entropy(input, target, weight, size_average, reduce, reduction)
 2760         weight = weight.expand(new_size)
 2761 
-> 2762     return torch._C._nn.binary_cross_entropy(input, target, weight, reduction_enum)
 2763 
 2764 

RuntimeError: all elements of input should be between 0 and 1

Predict on

TypeError                                 Traceback (most recent call last)
<ipython-input-9-20bb03f284db> in <module>()
      1 model.load(checkpointdir=checkpointdir)
      2 
----> 3 model.predict(testdata=process.X_test, target=process.y_test)

/content/drive/My Drive/herremans_data/src/wavenetmodel.py in predict(self, testdata, target)
    147         self.model.eval()
    148         testdata = testdata.to(self.device)
--> 149         predictions = self.model(testdata.float()).detach().numpy()
    150         idxs = np.unique(np.where(np.isnan(predictions))[0]).tolist()
    151         preds = np.delete(predictions.copy(), idxs, axis=0)

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

dorienh / jesse Goto Github PK

jesse's People

Contributors

Stargazers

Watchers

jesse's Issues

Recommend Projects

Recommend Topics

Recommend Org