Hi, Doctor. Sorry to bother you again. When I tried to run Reverse-T

CUDA out of memory when running Reverse-Time Migration of Marmousi example about deepwave HOT 2 CLOSED

ar4 commented on August 29, 2024

CUDA out of memory when running Reverse-Time Migration of Marmousi example

from deepwave.

Comments (2)

ar4 commented on August 29, 2024

Hello again,

PyTorch's caching, when it decides to free unused memory, and the use of optimizers that accumulate information over iterations, make it hard to predict memory usage. I see that you have already tried to force the cache to be emptied. You might try to expand that a bit to something like this to see if it helps (you will need to add import gc):

del loss, simulated_data
gc.collect()
torch.cuda.empty_cache()

If that is not sufficient, then you will probably have to use smaller batch sizes. You can still perform the optimizer step with gradients from the same number of shots, if you wish, by accumulating the gradients over multiple batches before performing a step. You could do that with something like this (where I accumulate the gradients over two batches, each half the size of yours, before performing an optimizer step):

n_epochs = 1
n_batch = 120
n_shots_per_batch = (n_shots + n_batch - 1) // n_batch
for epoch in range(n_epochs):
    epoch_loss = 0
    for outer_batch in range(n_batch//2):
        optimiser.zero_grad()
        for inner_batch in range(2):
            batch = outer_batch * 2 + inner_batch
            print(batch)
            batch_start = batch * n_shots_per_batch
            batch_end = min(batch_start + n_shots_per_batch, n_shots)
            if batch_end <= batch_start:
                continue
            s = slice(batch_start, batch_end)

            simulated_data = scalar_born(v_mig.detach(), scatter, dx, dt,
                                         source_amplitudes=source_amplitudes[s].detach(),
                                         source_locations=source_locations[s].detach(),
                                         receiver_locations=receiver_locations[s].detach(),
                                         pml_freq=freq)
            loss = (1e9 * loss_fn(simulated_data[-1] * mask[s], observed_scatter_masked[s]))
            epoch_loss += loss.item()
            loss.backward()
        optimiser.step()
    print(epoch_loss)

If you are going to perform an optimizer step after each batch (whether bigger batches, as you were doing, or when accumulating over multiple smaller batches as in my example above), then I suggest that you might want to randomise which shots are in each batch between epochs.

from deepwave.

XuVV commented on August 29, 2024

Thank you very much, Doctor. using del loss, simulated_data gc.collect() torch.cuda.empty_cache() can solve this problem.

from deepwave.

Recommend Projects

CUDA out of memory when running Reverse-Time Migration of Marmousi example about deepwave HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent