Coder Social home page Coder Social logo

Comments (23)

SkyTNT avatar SkyTNT commented on May 29, 2024 1

the default training command is what i'm using.
dataset is https://huggingface.co/datasets/projectlosangeles/Los-Angeles-MIDI-Dataset

from midi-model.

SkyTNT avatar SkyTNT commented on May 29, 2024
  • You cannot use onnx file to resume train or load from onnx file. What you need is ckpt file.
  • The data set is divided into a training set and a validation set. The sample size in the validation set is specified by --data-val-split (default 128), and the remaining samples are the training set samples.
  • The number of midi files is too small to train at all, it must be at least 1k.
  • Are you sure you have enough video memory to support such a high batch-size? I have 24g but can only use 2 batch-size.

from midi-model.

Epictyphlosion avatar Epictyphlosion commented on May 29, 2024

I have an RTX 2060, not sure if that's enough. Also I forgot about the ckpt version of the midi model, thanks for reminding me lol.

I feel at least a basic tutorial on training would help.

from midi-model.

Epictyphlosion avatar Epictyphlosion commented on May 29, 2024

Now pytorch is giving me a MisconfigurationException: No supported gpu backend found! error. This appears to be a known error with lightning, but downgrading to version 1.7.7 doesn't fix anything like it did for someone else.

Also, I'm not sure if the --accelerator option does anything because of the accelerator="gpu", on line 356. I get the same error even when I tell it to use my CPU.

from midi-model.

SkyTNT avatar SkyTNT commented on May 29, 2024

Are you installed pytorch with cuda support?
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

from midi-model.

Epictyphlosion avatar Epictyphlosion commented on May 29, 2024

Yes, I did that awhile before writing those replies.

from midi-model.

SkyTNT avatar SkyTNT commented on May 29, 2024

Please confirm what version you have installed.
pip show torch

from midi-model.

Epictyphlosion avatar Epictyphlosion commented on May 29, 2024

I have version 2.0.1.

from midi-model.

SkyTNT avatar SkyTNT commented on May 29, 2024

Have you installed CUDA?

from midi-model.

Epictyphlosion avatar Epictyphlosion commented on May 29, 2024

Yes, I have torch with cuda 11.7 as well as cuda-python 3.0.2.

from midi-model.

ArgeNH avatar ArgeNH commented on May 29, 2024

@Epictyphlosion can you share your dataset training? i have the same issue

from midi-model.

SkyTNT avatar SkyTNT commented on May 29, 2024

Yes, I have torch with cuda 11.7 as well as cuda-python 3.0.2.

I mean nvidia cuda toolkit, not cuda-python.

from midi-model.

ArgeNH avatar ArgeNH commented on May 29, 2024

@SkyTNT can u share a correct use of command to training?
or example a dataset training

from midi-model.

Epictyphlosion avatar Epictyphlosion commented on May 29, 2024

I mean nvidia cuda toolkit, not cuda-python.

I didn't have that. I've installed it from this link: https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_537.13_windows.exe

from midi-model.

Epictyphlosion avatar Epictyphlosion commented on May 29, 2024

the default training command is what i'm using.

I can't find the default command anywhere. I'm still getting the same "no supported GPU backend" error even after installing cuda toolkit.

from midi-model.

SkyTNT avatar SkyTNT commented on May 29, 2024

Default command is python train.py.
And please provide the list of installed packages on your environment via pip list.

from midi-model.

Epictyphlosion avatar Epictyphlosion commented on May 29, 2024

That's the command I was using.
My packages are as follows

absl-py 1.4.0
aiobotocore 2.5.4
aiofiles 23.2.1
aiohttp 3.8.5
aioitertools 0.11.0
aiosignal 1.3.1
altair 5.1.1
annotated-types 0.5.0
ansicon 1.89.0
anyio 3.7.1
arrow 1.2.3
astunparse 1.6.3
async-timeout 4.0.3
attrs 23.1.0
backoff 2.2.1
bandcamp-downloader 0.0.13
beautifulsoup4 4.11.1
blessed 1.20.0
botocore 1.31.17
cachetools 5.3.1
certifi 2022.6.15
chardet 5.0.0
charset-normalizer 2.1.1
click 8.1.7
colorama 0.4.6
coloredlogs 15.0.1
contourpy 1.1.1
croniter 1.3.15
cuda-python 12.2.0
cycler 0.11.0
Cython 3.0.2
datasets 2.14.5
dateutils 0.6.12
deepdiff 6.5.0
demjson3 3.0.5
dill 0.3.7
dnspython 2.4.2
docopt 0.6.2
email-validator 2.0.0.post2
exceptiongroup 1.1.3
fastapi 0.103.1
ffmpy 0.3.1
filelock 3.12.4
fire 0.5.0
flatbuffers 23.5.26
fonttools 4.42.1
frozenlist 1.4.0
fsspec 2023.6.0
gast 0.5.4
google-auth 2.22.0
google-auth-oauthlib 1.0.0
google-pasta 0.2.0
gradio 3.44.4
gradio_client 0.5.1
grpcio 1.56.0
h11 0.14.0
h5py 3.9.0
httpcore 0.18.0
httptools 0.6.0
httpx 0.25.0
huggingface-hub 0.17.2
humanfriendly 10.0
idna 3.3
importlib-resources 6.1.0
inquirer 3.1.3
itsdangerous 2.1.2
Jinja2 3.1.2
jinxed 1.2.0
jmespath 1.0.1
jsonschema 4.19.1
jsonschema-specifications 2023.7.1
keras 2.14.0
kiwisolver 1.4.5
libclang 16.0.6
lightning 2022.9.22
lightning-app 0.6.2
lightning-cloud 0.5.7
lightning-utilities 0.3.0
lxml 4.9.1
Markdown 3.4.3
markdown-it-py 3.0.0
MarkupSafe 2.1.3
matplotlib 3.8.0
mdurl 0.1.2
ml-dtypes 0.2.0
mock 4.0.3
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.15
mutagen 1.45.1
networkx 3.1
numpy 1.25.1
oauthlib 3.2.2
opt-einsum 3.3.0
optimum 1.13.2
ordered-set 4.1.0
orjson 3.9.7
packaging 23.1
pandas 2.1.1
Pillow 10.0.1
pip 23.2.1
protobuf 4.23.4
psutil 5.9.5
pyarrow 13.0.0
pyasn1 0.5.0
pyasn1-modules 0.3.0
pydantic 2.1.1
pydantic_core 2.4.0
pydantic-extra-types 2.1.0
pydantic-settings 2.0.3
pyDeprecate 0.3.2
pydub 0.25.1
pyFluidSynth 1.3.2
Pygments 2.16.1
PyJWT 2.8.0
pyparsing 3.1.1
pyreadline3 3.4.1
python-dateutil 2.8.2
python-dotenv 1.0.0
python-editor 1.0.4
python-multipart 0.0.6
pytorch-lightning 1.7.7
pytz 2023.3.post1
PyYAML 6.0.1
readchar 4.0.5
referencing 0.30.2
regex 2023.8.8
requests 2.28.1
requests-oauthlib 1.3.1
rich 13.5.3
rpds-py 0.10.3
rsa 4.9
s3fs 2023.6.0
safetensors 0.3.3
semantic-version 2.10.0
sentencepiece 0.1.99
setuptools 63.2.0
six 1.16.0
sniffio 1.3.0
soupsieve 2.3.2.post1
starlette 0.27.0
starsessions 1.3.0
sympy 1.12
tensorboard 2.14.1
tensorboard-data-server 0.7.1
tensorflow 2.14.0
tensorflow-estimator 2.14.0
tensorflow-intel 2.14.0
tensorflow-io-gcs-filesystem 0.31.0
termcolor 2.3.0
tokenizers 0.13.3
toolz 0.12.0
torch 2.0.1
torchaudio 2.0.2+cu117
torchmetrics 0.11.4
torchvision 0.15.2+cu117
tqdm 4.66.1
traitlets 5.10.0
transformers 4.33.2
typing_extensions 4.8.0
tzdata 2023.3
ujson 5.8.0
unicode-slugify 0.1.5
Unidecode 1.3.4
urllib3 1.26.12
uvicorn 0.23.2
watchfiles 0.20.0
wcwidth 0.2.6
websocket-client 1.6.3
websockets 11.0.3
Werkzeug 2.3.6
wheel 0.40.0
wrapt 1.14.1
xxhash 3.3.0
yarl 1.9.2

from midi-model.

SkyTNT avatar SkyTNT commented on May 29, 2024

Your torch is not cuda version.
Please install cuda version via pip install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

from midi-model.

SkyTNT avatar SkyTNT commented on May 29, 2024

Maybe you also need to update pytorch-lightning via pip install -U pytorch-lightning

from midi-model.

Epictyphlosion avatar Epictyphlosion commented on May 29, 2024

It got past the accelerator checking, and now I get another error as usual...

---start train---
[rank: 0] Seed set to 0
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [license.piriform.com]:61321 (system error: 10049 - The requested address is not valid in its context.).
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [license.piriform.com]:61321 (system error: 10049 - The requested address is not valid in its context.).
Traceback (most recent call last):
File "C:\Users\MainUser\Downloads\midi-generator-app-gpu\train.py", line 369, in
trainer.fit(model, train_dataloader, val_dataloader, ckpt_path=ckpt_path)
File "C:\Python\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "C:\Python\lib\site-packages\pytorch_lightning\trainer\call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "C:\Python\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 102, in launch
return function(*args, **kwargs)
File "C:\Python\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "C:\Python\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 947, in _run
self.strategy.setup_environment()
File "C:\Python\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 147, in setup_environment
self.setup_distributed()
File "C:\Python\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 198, in setup_distributed
_init_dist_connection(self.cluster_environment, self._process_group_backend, timeout=self._timeout)
File "C:\Python\lib\site-packages\lightning_fabric\utilities\distributed.py", line 290, in _init_dist_connection
torch.distributed.init_process_group(torch_distributed_backend, rank=global_rank, world_size=world_size, **kwargs)
File "C:\Python\lib\site-packages\torch\distributed\distributed_c10d.py", line 907, in init_process_group
default_pg = _new_process_group_helper(
File "C:\Python\lib\site-packages\torch\distributed\distributed_c10d.py", line 1013, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in

from midi-model.

SkyTNT avatar SkyTNT commented on May 29, 2024

You need to install https://developer.nvidia.com/nccl

from midi-model.

Epictyphlosion avatar Epictyphlosion commented on May 29, 2024

None of the installers are for Windows

from midi-model.

SkyTNT avatar SkyTNT commented on May 29, 2024

you can search on Google
RuntimeError: Distributed package doesn't have NCCL built in

I can't help you anymore because I don't have time

from midi-model.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.