Coder Social home page Coder Social logo

Comments (9)

soumith avatar soumith commented on August 25, 2024

The fake images update too. It's just that the network has converged, and the changes are very minor.

from dcgan.torch.

aferriss avatar aferriss commented on August 25, 2024

@soumith Well I would have thought so too, but why is the network converging so quickly? I ran the code on the same data set, but at epoch 100, the fake output looks exactly the same as it did at epoch 2 (mostly noise). This was not the case training on my other machine.

from dcgan.torch.

aferriss avatar aferriss commented on August 25, 2024

Actually after running a few more tests I'm fairly sure this is some issue related to the gpu.

Here are two logs using the same random seed and data, the first is cpu and the second is gpu. The gpu converges extremely quickly, towards the end of epoch 1. The cpu never really converges and the error seems to occasionally spike.

I also installed the cudnn with the r4 bindings because I was getting the error mentioned in this post. Not sure if this could be related...
karpathy/neuraltalk2#44

GPU Log:

~/dc$ gpu=1 display_id=41 DATA_ROOT=/media/aferriss/SHARED/allImgs/Users/adamferriss/Desktop/dcgan.torch/myimages dataset=folder nThreads=12 th main.lua

Full log: https://gist.github.com/soumith/58cdb2ef9f8e0dbbf71a

from dcgan.torch.

aferriss avatar aferriss commented on August 25, 2024

Ok, after a few frustrating hours I figured it out. I believe there is a problem with cudnn 4. I downgraded to cudnn 3 and now everything seems to work fine.

from dcgan.torch.

soumith avatar soumith commented on August 25, 2024

Hmm that's weird, I'll look into it too.

from dcgan.torch.

shubhtuls avatar shubhtuls commented on August 25, 2024

I also encountered this issue on Ubuntu with GPU training and disabling the checkpoint save statements at the end of the epoch seemed to fix the issue.

from dcgan.torch.

soumith avatar soumith commented on August 25, 2024

Thanks to Kenneth Marino from CMU, I figured out the bug.
Pushing a patch shortly, sorry about this.
The issue is that the flattened parameter buffers gotten by the getParameters call get untied after util.save, because in util.save I typecast the network to CPU and then back to GPU.
I added util.save / util.load while making my code ready for release, so I did not notice this issue in my experiments.

from dcgan.torch.

soumith avatar soumith commented on August 25, 2024

The bug is fixed now via: 5b093d3

Sorry about this.

from dcgan.torch.

aferriss avatar aferriss commented on August 25, 2024

@soumith Thanks so much! Glad to know I'm not crazy 👍

from dcgan.torch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.