Comments (6)
Please let me take a look then 😢. Seems like something is broken.
from big-discriminator-batch-spoofing-gan.
Could you please try switching to python 3.5.6
?
from big-discriminator-batch-spoofing-gan.
Alright, first I created a new env with python 3.5.6 and installed pytorch 1.0.1, cudatoolkit 10, etc. again but on running I get the following error:
Starting the training process ...
Epoch: 1
Exception in thread Thread-2:
Traceback (most recent call last):
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _ser
ve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
Exception in thread Thread-2:
Traceback (most recent call last):
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _ser
ve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
Exception in thread Thread-2:
Traceback (most recent call last):
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
^CTraceback (most recent call last):
File "train.py", line 310, in <module>
main(parse_arguments())
File "train.py", line 304, in main
fid_batch_size=args.fid_batch_size
File "/home/hans/BBMSG-GAN/sourcecode/MSG_GAN/GAN.py", line 539, in train
while real_data_store.hasnext() and batch_counter < limit:
File "/home/hans/BBMSG-GAN/sourcecode/MSG_GAN/utils/iter_utils.py", line 31, in hasnext
self._thenext = next(self.it)
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
idx, batch = self._get_batch()
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 610, in _get_batch
return self.data_queue.get()
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/multiprocessing/queues.py", line 113, in get
return ForkingPickler.loads(res)
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/site-packages/torch/multiprocessing/reductions.py", line 256, in rebuild_storage_fd
fd = df.detach()
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/multiprocessing/connection.py", line 732, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/hans/.conda/envs/bbmsg/lib/python3.5/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt
Found some answers online that either upgrading to python 3.7 or setting num_workers=0 fixed it. However, running again with num_workers=0 left me in the same situation of zombie processes being left on my GPU.
from big-discriminator-batch-spoofing-gan.
Hmmm ... 😮. Can you try upgrading pytorch to 1.1.0?
from big-discriminator-batch-spoofing-gan.
No joy, still the same threading errors :(
from big-discriminator-batch-spoofing-gan.
My entire env for reference
# packages in environment at /home/hans/.conda/envs/bbmsg:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
blas 1.0 mkl
ca-certificates 2019.5.15 1 anaconda
certifi 2018.8.24 py35_1 anaconda
cffi 1.11.5 py35he75722e_1
cudatoolkit 10.0.130 0
freetype 2.9.1 h8a8886c_1
intel-openmp 2019.4 243
jpeg 9b h024ee3a_2
libedit 3.1.20181209 hc058e9b_0
libffi 3.2.1 hd88cf55_4
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.0.10 h2733197_2
mkl 2019.4 243
ncurses 6.1 he6710b0_1
ninja 1.8.2 py35h6bb024c_1
numpy 1.14.2 py35hdbf6ddf_0
olefile 0.46 py35_0
openssl 1.0.2t h7b6447c_1 anaconda
pillow 5.2.0 py35heded4f4_0
pip 10.0.1 py35_0
protobuf 3.9.1 pypi_0 pypi
pycparser 2.19 py35_0
python 3.5.6 hc3d631a_0
pytorch 1.1.0 py3.5_cuda10.0.130_cudnn7.5.1_0 pytorch
readline 7.0 h7b6447c_5
scipy 1.3.1 pypi_0 pypi
setuptools 40.2.0 py35_0
six 1.11.0 py35_1
sqlite 3.29.0 h7b6447c_0
tensorboardx 1.8 pypi_0 pypi
tk 8.6.8 hbc83047_0
torchvision 0.3.0 py35_cu10.0.130_1 pytorch
tqdm 4.35.0 pypi_0 pypi
wheel 0.31.1 py35_0
xz 5.2.4 h14c3975_4
zlib 1.2.11 h7b6447c_3
zstd 1.3.7 h0b5b093_0
from big-discriminator-batch-spoofing-gan.
Related Issues (4)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from big-discriminator-batch-spoofing-gan.