damitkwr / esrnn-gpu Goto Github PK

View Code? Open in Web Editor NEW

318.0 19.0 72.0 76.96 MB

PyTorch GPU implementation of the ES-RNN model for time series forecasting

License: MIT License

R 4.28% Python 95.72%

forecasting pytorch deep-learning time-series-forecasting es-rnn deep-forecasting

esrnn-gpu's People

Contributors

Stargazers

Watchers

Forkers

catapulta tomfisher nguyenkaos oleksastepaniuk 12suyash vgoklani stefaj josedanielfd lennonmwy tonylibing tjphilpot yejiachen valeman arita37 astrogilda wmmxk antoniorossi kevinliupy azinflou hungaroring ts218 yutarochan jiayuheusyd rindranil jjubly jonekeat siamakz mcren88 sidhantls danielberberich xyhuang samuelgneff iaroslavsimporter dimatolsto deep-learning-trader trungnghiahoang96 markhsia xuhaoecon matheusguim kentchun33333 novinsh raijin0704 dimoxx yuemind yuanenpeng pariyashu goodpupil yangspeaking shawn-nau zhuanglineu nimabte thesky0108 nisargvp sourisvert kurucan nithish08 rotcx liuzhixin1976 surajitdb sonamtripathi sandy4321 maxmax1992 sangkyunjo minasouliman gomlfx jszitas raghadalnouri mfriendly whisper034 boryzic919 kpup1710

esrnn-gpu's Issues

issue in upacking the project

i have a problem in installing i type pip install git+https://github.com/damitkwr/ESRNN-GPU.git then error occurs then i type pip install git+https://github.com/damitkwr/ESRNN-GPU.git#egg=ESRNN-GPU then also error show plz help

Epoch Loss not getting updated properly?

Edit: Never mind. The loss is just getting averaged as batch_num is outside of the for-loop that it increases in.

In trainer.py inside the train method after an epoch finishes, the epoch_loss is divided by the batch_num + 1. That means that after every batch, the epoch_loss is forcefully decreased as the denominatior (the batch_num) is constantly getting bigger:

epoch_loss = epoch_loss / (batch_num + 1)

Maybe I'm misunderstanding something here, but it doesn't seem right that the loss is getting artificially decreased simply based on which batch the training loop is on. I looked through the original C++ implementation, but couldn't find anything that looked like the above line (I don't know C++ very well, so that may be why).

P.S.
Thanks for the python/torch implementation of this project btw, it's a great resource for learning some good forecasting methods/strategies.

'Logger' object has no attribute 'log_histogram'

For Weekly exemple

Is Tensorflow required?

You import it in logger.py

How to change for different timeframe

How would one update the config for Hourly or Daily ?

On Daily I seem to be getting errors.

On hourly I have in the config:

'chop_val': 200,
'variable': "Hourly",
'dilations': ((1, 12), (12, 24)),
'state_hsize': 50,
'seasonality': 24,
'input_size': 24,
'output_size': 48,
'level_variability_penalty': 50

and I get the error

input.size(-1) must be equal to input_size. Expected 30, got 25

On Daily (default) I have:

'chop_val': 200,
'variable': "Daily",
'dilations': ((1, 7), (14, 28)),
'state_hsize': 50,
'seasonality': 7,
'input_size': 7,
'output_size': 14,
'level_variability_penalty': 50

and I get the error:

ValueError: Item wrong length 4226 instead of 4227.

Any advice on how to proceed would be appreciated

AttributeError: module 'tensorboard.summary._tf.summary' has no attribute 'Value'

end-to-end example as a notebook?

Any chance you could push an end-to-end example as a jupyter notebook? It's really hard to follow the codebase... thanks!

es_rnn module not found error

when i run the code in pthon es_rnn module not found occurs shows

Poor Performance on GCP V100s

I'm running the code nearly unchanged on a Google Cloud Compute instance with 2x Nvidia V100, 60GB RAM, 16CPUs. config.py is unchanged.

With 15 epochs on the Quarterly data, total training time is 16.01 minutes, almost double the 8.94minutes shown in the paper. However, the validation results at the end of epoch 15 are nearly identical to the paper's reported results:

{'Demographic': 10.814908027648926, 'Finance': 10.71678638458252, 'Industry': 7.436440944671631, 'Macro': 9.547700881958008, 'Micro': 11.63847827911377, 'Other': 7.911505699157715, 'Overall': 10.091866493225098, 'loss': 7.8162946701049805}

When I remove the model saving step, training time decreased to 15.76 minutes.

I downloaded the dataset from the provided link, and made no changes.

I'm using updated package versions, although I wouldn't expect this to halve performance:

pytorch 1.2
tensorflow 1.14.0

What hardware configuration was the authors' testing done on? I'm using dual V100s, the highest-end GPUs available on GCP. I'd expect to match or outperform the reported benchmarks. Do you have any thoughts on why my performance is considerably worse in my situation?

Could you explain what does the forward function in model.py? I did not see any reference of this function in the files? Could you explain how it is used? thanks!

What is info.csv in es-rnn(main.py)?

Hi,
I am trying to use the code, but I encounter several issues. One is that,
"train_path = '../data/Train/%s-train.csv' % (config['variable'])
test_path = '../data/Test/%s-test.csv' % (config['variable'])"
in main.py cause some problems. If I understand the code correctly, it changes the path to "../data/Train/Daily-train.csv", which is a CSV file that does not exist. The other issue is that I do not really understand what kind of information info.csv should contain. Would you please help me deal with these problems?
Thanks

Train and test csv formatting

Hey, thank you for publishing your results, very impressive.
What is the csv formatting for Train and Test? I've noticed that read_file creates arrays of different shapes:
Train ends up in shape (number_of_series, )
Test - (number_of_series, time_steps)

I would like to reproduce it on my data. How to format Train csv with pd.to_csv to be properly processed by your code?

Thanks!
Best regards

Using the model with different data

Hi @damitkwr

How would you use the model with different data.
Say univariate daily data.

Many thanks,
Best,
Andrew

showing syntax error on loading dataset...i have loaded the m4 dataset plz tell where in syntax i am wrong

def read_file(C:\Users\welcome\Downloads\m4 forecast\M4-methods-master\Dataset()):

SyntaxError: unexpected character after line continuation character

ModuleNotFoundError: No module named 'ESRNN.m4_data'; 'ESRNN' is not a package

Hello,

I've tried to install ESRNN via the instruction in this link: https://pypi.org/project/ESRNN/
which were: pip install ESRNN

However, when I try to run the follow code:

from ESRNN.m4_data import prepare_m4_data
from ESRNN.utils_evaluation import evaluate_prediction_owa

from ESRNN import ESRNN

I get the following error:

Traceback (most recent call last):
File "ESRNN.py", line 2, in
from ESRNN.m4_data import prepare_m4_data
File "C:\Users\mario\Documents\Python Benjamin\ESRNN.py", line 2, in
from ESRNN.m4_data import prepare_m4_data
ModuleNotFoundError: No module named 'ESRNN.m4_data'; 'ESRNN' is not a package

I've tried adding ESRNN as a path variable and still get the same error.

Could anyone please assist?

max_loss is not being updated

Hi,

It seems that max_loss in function train_epochs() at esrnn/trainer.py is not being updated appropriately

def train_epochs(self):
        max_loss = 1e8
        start_time = time.time()
        for e in range(self.max_epochs):
            self.scheduler.step()
            epoch_loss = self.train()
            if epoch_loss < max_loss:
                self.save()
            epoch_val_loss = self.val()
            if e == 0:
                file_path = os.path.join(self.csv_save_path, 'validation_losses.csv')
                with open(file_path, 'w') as f:
                    f.write('epoch,training_loss,validation_loss\n')
            with open(file_path, 'a') as f:
                f.write(','.join([str(e), str(epoch_loss), str(epoch_val_loss)]) + '\n')
        print('Total Training Mins: %5.2f' % ((time.time()-start_time)/60))

Thanks!

error occurs

AttributeError: module 'tensorboard.summary._tf.summary' has no attribute 'FileWriter'..........how to fix it

Dataset is so complicated

Hi, I've tried to understand the dataset and how do you really train the model on it, but it seems that the information is not available about the competition anymore.
Can you explain how the data is loaded for example from monthly Train.csv?

The prediction intervals code

Sorry to bother you. I can't find which part in this code implement the prediction intervals(PI) function. Is this code just for point forecast(PF)?
Thank you.

About the Input and output

Hi there,
My dataset only have two column- Date and Price. Is it possible to let the input be the price and the output be the price as well. If so, how should I divide them into x_train,y_train, and x_test, y_test. Will the algorithm do it for me automatically? Or if I only have these two columns, I will not be able to use this algorithm?

Questions/Clarifications on ESRNN-GPU vs original ES-RNN

Great work on this project! Having used a version of the original ES-RNN code, I have a few questions about the differences between the implementations and the results presented in the paper.

The paper mentions "Note that for monthly data, Smyl et al. (2018) were running the algorithm of 6 pairs of 2 workers and for quarterly data, 4 pairs of 2 workers were used." For the results, did the times reported in the paper represent running the ESRNN-GPU implementation with multiple workers in aggregate (CPU Time), multiple workers concurrently (Wall Clock Time) or was the time reported for a single worker?
What GPU did you test on? Testing the M4 data set with CUDA enabled PyTorch on a notebook graphics cards (Nvidia GeForce GTX 1050) vs non-CUDA enabled PyTorch showed the CPU only version to be faster (i7 8550) by about 3x. It is likely that the PyTorch CPU enabled version is still faster the the original ESRNN, but I have not confirmed that.
Is there any plan to implemented the future work for Variable Length Series mentioned in Section 8.1? What would be required?

will it work for multivariate time series prediction both regression and classification

great code thanks
may you clarify :
will it work for multivariate time series prediction both regression and classification
1
where all values are continues values
weight height age target
1 56 160 34 1.2
2 77 170 54 3.5
3 87 167 43 0.7
4 55 198 72 0.5
5 88 176 32 2.3

2
or even will it work for multivariate time series where values are mixture of continues and categorical values
for example 2 dimensions have continues values and 3 dimensions are categorical values

color        weight     gender  height  age  target

1 black 56 m 160 34 yes
2 white 77 f 170 54 no
3 yellow 87 m 167 43 yes
4 white 55 m 198 72 no
5 white 88 f 176 32 yes