There appears to be some long term running memory leak, probably related to graphs. As

Sorry the late response. I added <div class="highli

Sorry the late response. I added <div class="highlight highlight

ack, now I'm getting crashes: <div class="highlight highlight-source-python notran

I stil cannot reproduce your issue 😥 <a href="https://github.com/r9

Memory leak about multilingual_text_to_speech HOT 8 CLOSED

tomiinek commented on May 28, 2024

Memory leak

from multilingual_text_to_speech.

Comments (8)

michael-conrad commented on May 28, 2024 1

Sorry the late response.

I added
plt.close("all")

well.. suddenly started getting messages about freeing stuff up not in the main thread or some such.

So I changed it to read (based on your instructions):

@staticmethod
    def evaluation(eval_step, losses, mcd, source_len, target_len, source, target, prediction_forced, prediction, stop_prediction, stop_target, alignment, classifier):
        """Log evaluation results.
        
        Arguments:
            eval_step -- number of the current evaluation step (i.e. epoch)
            losses (dictionary of {loss name, value})-- dictionary with values of batch losses
            mcd (float) -- evaluation Mel Cepstral Distorsion
            source_len (tensor) -- number of characters of input utterances
            target_len (tensor) -- number of frames of ground-truth spectrograms
            source (tensor) -- input utterances
            target (tensor) -- ground-truth spectrograms
            prediction_forced (tensor) -- ground-truth-aligned spectrograms
            prediction (tensor) -- predicted spectrograms
            stop_prediction (tensor) -- predicted stop token probabilities
            stop_target (tensor) -- true stop token probabilities
            alignment (tensor) -- alignments (attention weights for each frame) of the last evaluation batch
            classifier (float) -- accuracy of the reversal classifier
        """  

        # log losses
        total_loss = sum(losses.values())
        Logger._sw.add_scalar(f'Eval/loss_total', total_loss, eval_step)
        for n, l in losses.items():
            Logger._sw.add_scalar(f'Eval/loss_{n}', l, eval_step) 

        # show random sample: spectrogram, stop token probability, alignment and audio
        idx = random.randint(0, alignment.size(0) - 1)
        predicted_spec = prediction[idx, :, :target_len[idx]].data.cpu().numpy()
        f_predicted_spec = prediction_forced[idx, :, :target_len[idx]].data.cpu().numpy()
        target_spec = target[idx, :, :target_len[idx]].data.cpu().numpy()  

        # log spectrograms
        if hp.normalize_spectrogram:
            predicted_spec = audio.denormalize_spectrogram(predicted_spec, not hp.predict_linear)
            f_predicted_spec = audio.denormalize_spectrogram(f_predicted_spec, not hp.predict_linear)
            target_spec = audio.denormalize_spectrogram(target_spec, not hp.predict_linear)
        
        f = Logger._plot_spectrogram(predicted_spec)
        Logger._sw.add_figure(f"Predicted/generated", f, eval_step)
        plt.close(f)
        
        f = Logger._plot_spectrogram(f_predicted_spec)
        Logger._sw.add_figure(f"Predicted/forced", f, eval_step)
        plt.close(f)
        
        f = Logger._plot_spectrogram(target_spec)
        Logger._sw.add_figure(f"Target/eval", f, eval_step) 
        plt.close(f)
        
        # log audio
        waveform = audio.inverse_spectrogram(predicted_spec, not hp.predict_linear)
        Logger._sw.add_audio(f"Audio/generated", waveform, eval_step, sample_rate=hp.sample_rate)  
        waveform = audio.inverse_spectrogram(f_predicted_spec, not hp.predict_linear)
        Logger._sw.add_audio(f"Audio/forced", waveform, eval_step, sample_rate=hp.sample_rate)              
        
        # log alignment
        alignment = alignment[idx, :target_len[idx], :source_len[idx]].data.cpu().numpy().T
        
        f=Logger._plot_alignment(alignment)
        Logger._sw.add_figure(f"Alignment/eval", f, eval_step)
        plt.close(f)                
        
        # log source text
        utterance = text.to_text(source[idx].data.cpu().numpy()[:source_len[idx]], hp.use_phonemes)
        Logger._sw.add_text(f"Text/eval", utterance, eval_step)      
        
        # log stop tokens
        f = Logger._plot_stop_tokens(stop_target[idx].data.cpu().numpy(), stop_prediction[idx].data.cpu().numpy())
        Logger._sw.add_figure(f"Stop/eval", f, eval_step) 
        plt.close(f)
        
        # log mel cepstral distorsion
        Logger._sw.add_scalar(f'Eval/mcd', mcd, eval_step)
        
        # log reversal language classifier accuracy
        if hp.reversal_classifier:
            Logger._sw.add_scalar(f'Eval/classifier', classifier, eval_step)

So far so good. At 6 epochs on the resumed training and the Xorg memory is no longer increasing every training loop. And no crashes.

from multilingual_text_to_speech.

michael-conrad commented on May 28, 2024 1

ok, I added PyQt5 to the environment and added the following to the main script:

import matplotlib
matplotlib.use("Qt5Agg")

And I'm resuming training now.

from multilingual_text_to_speech.

Tomiinek commented on May 28, 2024

Hello, thank you for your observation!

I unfortunately cannot replicate the problem.
The code does not explicitly dispose created figures which are passed into tensorboard's SummaryWritter. However, the documentation of SummaryWritter.add_figure(tag, figure, global_step=None, close=True, walltime=None) says that the call should automatically close the figure if close=True.

Can you please change the utils/logging.py file as follows and test whether it works?

... 

# log spectrograms
if hp.normalize_spectrogram:
    predicted_spec = audio.denormalize_spectrogram(predicted_spec, not hp.predict_linear)
    f_predicted_spec = audio.denormalize_spectrogram(f_predicted_spec, not hp.predict_linear)
    target_spec = audio.denormalize_spectrogram(target_spec, not hp.predict_linear)

f = Logger._plot_spectrogram(predicted_spec)
Logger._sw.add_figure(f"Predicted/generated", f, eval_step)
plt.close(f)

f = Logger._plot_spectrogram(f_predicted_spec)
Logger._sw.add_figure(f"Predicted/forced", f, eval_step)
plt.close(f)

f = Logger._plot_spectrogram(target_spec)
Logger._sw.add_figure(f"Target/eval", f, eval_step) 
plt.close(f)

# log audio
waveform = audio.inverse_spectrogram(predicted_spec, not hp.predict_linear)
Logger._sw.add_audio(f"Audio/generated", waveform, eval_step, sample_rate=hp.sample_rate)  
waveform = audio.inverse_spectrogram(f_predicted_spec, not hp.predict_linear)
Logger._sw.add_audio(f"Audio/forced", waveform, eval_step, sample_rate=hp.sample_rate)              
        
# log alignment
alignment = alignment[idx, :target_len[idx], :source_len[idx]].data.cpu().numpy().T
f = Logger._plot_alignment(alignment)
Logger._sw.add_figure(f"Alignment/eval", f, eval_step)          
plt.close(f)   
        
# log source text
utterance = text.to_text(source[idx].data.cpu().numpy()[:source_len[idx]], hp.use_phonemes)
Logger._sw.add_text(f"Text/eval", utterance, eval_step)      
        
# log stop tokens
Logger._sw.add_figure(f"Stop/eval", Logger._plot_stop_tokens(stop_target[idx].data.cpu().numpy(), stop_prediction[idx].data.cpu().numpy()), eval_step) 
        
...

Thank you very much.

from multilingual_text_to_speech.

michael-conrad commented on May 28, 2024

Sorry the late response.

I added

plt.close("all")

as the last statement in both the evaluation reporting and the train reporting.

This seems to have solved the issue where the plots were causing the Xorg server to reserve memory for plots never being screen displayed.

from multilingual_text_to_speech.

michael-conrad commented on May 28, 2024

ack, now I'm getting crashes:

Exception ignored in: <function Image.__del__ at 0x7efbc32268c0>
Traceback (most recent call last):
  File "/home/muksihs/git/Multilingual_Text_to_Speech/env/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Image.__del__ at 0x7efbc32268c0>
Traceback (most recent call last):
  File "/home/muksihs/git/Multilingual_Text_to_Speech/env/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Image.__del__ at 0x7efbc32268c0>
Traceback (most recent call last):
  File "/home/muksihs/git/Multilingual_Text_to_Speech/env/lib/python3.7/tkinter/__init__.py", line 3507, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop

from multilingual_text_to_speech.

Tomiinek commented on May 28, 2024

I stil cannot reproduce your issue 😥

This issue concerns something similar. It solves the problem with these changes, i.e.:

I fixed issue of #5 by changing the backend of matplotlib from Tkinter(TkAgg) to PyQt5(Qt5Agg).
(See https://stackoverflow.com/questions/14694408/runtimeerror-main-thread-is-not-in-main-loop and http://matplotlib.1069221.n5.nabble.com/Matplotlib-Tk-and-multithreading-td40647.html )

Another way is probably to remove the plt.close(...) as I suggested above and sometimes explicitly force garbage collecting:

import gc
gc.collect()

Can you try it out and let me know, please?

from multilingual_text_to_speech.

michael-conrad commented on May 28, 2024

ok, I added PyQt5 to the environment and added the following to the main script:
import matplotlib
matplotlib.use("Qt5Agg")
And I'm resuming training now.

With the plot.close(f) code, no crashes because of thread violations so far. (5 hours run time).

from multilingual_text_to_speech.

Tomiinek commented on May 28, 2024

I am glad to hear that 🙂

from multilingual_text_to_speech.

Memory leak about multilingual_text_to_speech HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent