Coder Social home page Coder Social logo

Comments (8)

sooftware avatar sooftware commented on May 28, 2024

I've got a temporary system that keeps training automatically to the shell.

for (( ; ; ))
do
    for entry in "./checkpoints"/*
    do
        last_checkpoint=$entry
    done

    tokens=$(echo $last_checkpoint | tr "/" "\n")
    for tokenin $tokens
    do
       checkpoint="$token"
    done
    echo "Load checkpoint : $checkpoint"

    PYTHONIOENCODING=utf-8 python3 train.py --hyper_parameters generated_training --checkpoint $checkpoint
done

from multilingual_text_to_speech.

Tomiinek avatar Tomiinek commented on May 28, 2024

This is really strange. I used 2 x 8GB GTX1080 for training and I have never experienced this problem ๐Ÿ˜•

The script you posted seems to be ok, I am not sure whether I can help you ... Couldn't be all the four runs executed on the same single GPU (i.e., the other three are left untouched)?

from multilingual_text_to_speech.

sooftware avatar sooftware commented on May 28, 2024

Hmm. that`s weird. ๐Ÿ˜ข๐Ÿ˜ข
If I find a problem, I'll write it down here. Thanks for the answer.

from multilingual_text_to_speech.

sooftware avatar sooftware commented on May 28, 2024

I trained including Korean and English.
So, as mentioned in the paper, I cut it to between 0.5 and 10.1 seconds and learned that there is no OOM error.

from multilingual_text_to_speech.

Tomiinek avatar Tomiinek commented on May 28, 2024

Oh yes, good point! ๐Ÿ‘ Thanks.

from multilingual_text_to_speech.

dina-adel avatar dina-adel commented on May 28, 2024

I encountered the same error while training on Arabic data; however, the length of audios is within the range. Any suggestions?

from multilingual_text_to_speech.

Tomiinek avatar Tomiinek commented on May 28, 2024

Hello, what size are your GPUs? Do you also use 2 x 8GB? Have you changed some parameters such as batch size or some audio-related stuff? What about the maximal length of your transcripts?

from multilingual_text_to_speech.

divineSix avatar divineSix commented on May 28, 2024

I'm trying to train this model on my data and I'm getting a CUDA OOM error during the second epoch. The first epoch is completed successfully, and the second epoch has 20% of the batches processed when this fails. My setup is 4x 12GB 1080s.

My dataset is of 6 languages, and I'm using the default multi-lingual hyperparameters with the exception of batch size, which is 24. My training dataloader has 2375 batches.

I've been trying to solve this for a while now and will summarize my attempts below:

  • Tried it with a very small version of my dataset (dataloader with 30 batches), and did not run into any CUDA issues early on. Seems like there are memory artifacts that are not being released every iteration, so with sufficiently long enough steps they're accumulating
  • **This leads me to believe that there are memory artifacts that are accumulating every iteration/step. ** i.e.

from multilingual_text_to_speech.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.