Coder Social home page Coder Social logo

Failure on first epoch about big-sleep HOT 11 CLOSED

nhalsteadvt avatar nhalsteadvt commented on August 14, 2024
Failure on first epoch

from big-sleep.

Comments (11)

lucidrains avatar lucidrains commented on August 14, 2024

@nhalsteadvt hmm, could you try upgrading your torchvision?

from big-sleep.

nhalsteadvt avatar nhalsteadvt commented on August 14, 2024

@lucidrains I thought 1.7.1 was the latest version after checking back here
I think the error might be between PyTorch and CUDA, but I couldn't find what versions I needed on this repo.

from big-sleep.

lucidrains avatar lucidrains commented on August 14, 2024

@nhalsteadvt what is your current cuda version? I'm running 10.2

from big-sleep.

nhalsteadvt avatar nhalsteadvt commented on August 14, 2024

@lucidrains I believe I'm running 11.2. I'll try to reinstall pytorch with CUDA 10.2 / make 10.2 (which I have installed) the active version.

from big-sleep.

lucidrains avatar lucidrains commented on August 14, 2024

@nhalsteadvt ohh sorry, actually i am running 11.1, so it should be fine!

from big-sleep.

enricoros avatar enricoros commented on August 14, 2024

Verified working with CUDA 10.1 and PyTorch for CUDA 10.1 as well.

from big-sleep.

enricoros avatar enricoros commented on August 14, 2024

@nhalsteadvt Still experiencing issues?

from big-sleep.

nhalsteadvt avatar nhalsteadvt commented on August 14, 2024

@enricoros Yeah it's a different error now about CUDA out of memory. I thought 15.8 gigs of usable RAM was enough, but it seems something else is wrong.

"RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 4.08 GiB already allocated; 1.16 MiB free; 4.18 GiB reserved in total by PyTorch)"

This stuff really isn't my strongsuit, but this looks like I don't have something configured right to use my GPU or something. I have 70gigs of storage space if that means anything.

from big-sleep.

enricoros avatar enricoros commented on August 14, 2024

@nhalsteadvt depends on the memory on the video card. With 8GB of Video mem (RTX 2070) I can run size=128 and size=256 images with no problem, but you need more memory for size=512 (stops after hundreds of iterations). What video card do you have? As alternative, you can run this project using the "simplified notebook" that you see on the home page, where the cards are NVIDIA T4s on the Google Cloud.

from big-sleep.

nhalsteadvt avatar nhalsteadvt commented on August 14, 2024

@enricoros I've been using the notebook a bit, so that's cool. Task manager says I have 7.9GB of shared memory between my Intel and Nvidia graphics cards. However, the DirectX Diagnostic tool says I have 8095MB (8.095GB) of shared memory.
How would I alter the image size?

edit: error now says "RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)"

from big-sleep.

enricoros avatar enricoros commented on August 14, 2024

@nhalsteadvt for the task manager stats, look at the "Dedicated GPU memory" value. The Shared GPU memory doesn't mean much (mine has 32GB shared, don't know where that's coming from). It's the dedicated that counts. For example, when running the code right now, I see "Dedicated GPU memory: 7.4/8.0GB" as roughly 90% of the GPU mem is allocated for this operation,

As far as the CUDA errors. You should make sure that the CUDA installed in your system matches the PyTorch expectations. For instance, I don't have the latest CUDA, I have a stable one (10.2 on Windows) that can be accessed here: https://developer.nvidia.com/cuda-10.2-download-archive. And then when downloading PyTorch, I select the same combo (Windows, CUDA 10.2) on the website. Finally I even download CUDNN that matches the CUDA version here: https://developer.nvidia.com/rdp/cudnn-download#a-collapse805-102 (selecting 10.2). Yeah it ain't pretty to get a system working nice.

from big-sleep.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.