I keep encountering this same error every time I try running the training :

the default training command is what i'm using. dataset is <a href="https://huggin

You cannot use onnx file to resume train or load from onnx file. What you need i

Error when training,about skytnt/midi-model

SkyTNT commented on May 29, 2024 1

the default training command is what i'm using.
dataset is https://huggingface.co/datasets/projectlosangeles/Los-Angeles-MIDI-Dataset

from midi-model.

SkyTNT commented on May 29, 2024

You cannot use onnx file to resume train or load from onnx file. What you need is ckpt file.
The data set is divided into a training set and a validation set. The sample size in the validation set is specified by --data-val-split (default 128), and the remaining samples are the training set samples.
The number of midi files is too small to train at all, it must be at least 1k.
Are you sure you have enough video memory to support such a high batch-size? I have 24g but can only use 2 batch-size.

from midi-model.

Epictyphlosion commented on May 29, 2024

I have an RTX 2060, not sure if that's enough. Also I forgot about the ckpt version of the midi model, thanks for reminding me lol.

I feel at least a basic tutorial on training would help.

from midi-model.

Epictyphlosion commented on May 29, 2024

Now pytorch is giving me a MisconfigurationException: No supported gpu backend found! error. This appears to be a known error with lightning, but downgrading to version 1.7.7 doesn't fix anything like it did for someone else.

Also, I'm not sure if the --accelerator option does anything because of the accelerator="gpu", on line 356. I get the same error even when I tell it to use my CPU.

from midi-model.

SkyTNT commented on May 29, 2024

Are you installed pytorch with cuda support?
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

from midi-model.

Epictyphlosion commented on May 29, 2024

Yes, I did that awhile before writing those replies.

from midi-model.

SkyTNT commented on May 29, 2024

Please confirm what version you have installed.
pip show torch

from midi-model.

Epictyphlosion commented on May 29, 2024

I have version 2.0.1.

from midi-model.

SkyTNT commented on May 29, 2024

Have you installed CUDA?

from midi-model.

Epictyphlosion commented on May 29, 2024

Yes, I have torch with cuda 11.7 as well as cuda-python 3.0.2.

from midi-model.

ArgeNH commented on May 29, 2024

@Epictyphlosion can you share your dataset training? i have the same issue

from midi-model.

SkyTNT commented on May 29, 2024

Yes, I have torch with cuda 11.7 as well as cuda-python 3.0.2.

I mean nvidia cuda toolkit, not cuda-python.

from midi-model.

ArgeNH commented on May 29, 2024

@SkyTNT can u share a correct use of command to training?
or example a dataset training

from midi-model.

Epictyphlosion commented on May 29, 2024

I mean nvidia cuda toolkit, not cuda-python.

I didn't have that. I've installed it from this link: https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_537.13_windows.exe

from midi-model.

Epictyphlosion commented on May 29, 2024

the default training command is what i'm using.

I can't find the default command anywhere. I'm still getting the same "no supported GPU backend" error even after installing cuda toolkit.

from midi-model.

SkyTNT commented on May 29, 2024

Default command is python train.py.
And please provide the list of installed packages on your environment via pip list.

from midi-model.

Epictyphlosion commented on May 29, 2024

That's the command I was using.
My packages are as follows

absl-py 1.4.0
aiobotocore 2.5.4
aiofiles 23.2.1
aiohttp 3.8.5
aioitertools 0.11.0
aiosignal 1.3.1
altair 5.1.1
annotated-types 0.5.0
ansicon 1.89.0
anyio 3.7.1
arrow 1.2.3
astunparse 1.6.3
async-timeout 4.0.3
attrs 23.1.0
backoff 2.2.1
bandcamp-downloader 0.0.13
beautifulsoup4 4.11.1
blessed 1.20.0
botocore 1.31.17
cachetools 5.3.1
certifi 2022.6.15
chardet 5.0.0
charset-normalizer 2.1.1
click 8.1.7
colorama 0.4.6
coloredlogs 15.0.1
contourpy 1.1.1
croniter 1.3.15
cuda-python 12.2.0
cycler 0.11.0
Cython 3.0.2
datasets 2.14.5
dateutils 0.6.12
deepdiff 6.5.0
demjson3 3.0.5
dill 0.3.7
dnspython 2.4.2
docopt 0.6.2
email-validator 2.0.0.post2
exceptiongroup 1.1.3
fastapi 0.103.1
ffmpy 0.3.1
filelock 3.12.4
fire 0.5.0
flatbuffers 23.5.26
fonttools 4.42.1
frozenlist 1.4.0
fsspec 2023.6.0
gast 0.5.4
google-auth 2.22.0
google-auth-oauthlib 1.0.0
google-pasta 0.2.0
gradio 3.44.4
gradio_client 0.5.1
grpcio 1.56.0
h11 0.14.0
h5py 3.9.0
httpcore 0.18.0
httptools 0.6.0
httpx 0.25.0
huggingface-hub 0.17.2
humanfriendly 10.0
idna 3.3
importlib-resources 6.1.0
inquirer 3.1.3
itsdangerous 2.1.2
Jinja2 3.1.2
jinxed 1.2.0
jmespath 1.0.1
jsonschema 4.19.1
jsonschema-specifications 2023.7.1
keras 2.14.0
kiwisolver 1.4.5
libclang 16.0.6
lightning 2022.9.22
lightning-app 0.6.2
lightning-cloud 0.5.7
lightning-utilities 0.3.0
lxml 4.9.1
Markdown 3.4.3
markdown-it-py 3.0.0
MarkupSafe 2.1.3
matplotlib 3.8.0
mdurl 0.1.2
ml-dtypes 0.2.0
mock 4.0.3
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.15
mutagen 1.45.1
networkx 3.1
numpy 1.25.1
oauthlib 3.2.2
opt-einsum 3.3.0
optimum 1.13.2
ordered-set 4.1.0
orjson 3.9.7
packaging 23.1
pandas 2.1.1
Pillow 10.0.1
pip 23.2.1
protobuf 4.23.4
psutil 5.9.5
pyarrow 13.0.0
pyasn1 0.5.0
pyasn1-modules 0.3.0
pydantic 2.1.1
pydantic_core 2.4.0
pydantic-extra-types 2.1.0
pydantic-settings 2.0.3
pyDeprecate 0.3.2
pydub 0.25.1
pyFluidSynth 1.3.2
Pygments 2.16.1
PyJWT 2.8.0
pyparsing 3.1.1
pyreadline3 3.4.1
python-dateutil 2.8.2
python-dotenv 1.0.0
python-editor 1.0.4
python-multipart 0.0.6
pytorch-lightning 1.7.7
pytz 2023.3.post1
PyYAML 6.0.1
readchar 4.0.5
referencing 0.30.2
regex 2023.8.8
requests 2.28.1
requests-oauthlib 1.3.1
rich 13.5.3
rpds-py 0.10.3
rsa 4.9
s3fs 2023.6.0
safetensors 0.3.3
semantic-version 2.10.0
sentencepiece 0.1.99
setuptools 63.2.0
six 1.16.0
sniffio 1.3.0
soupsieve 2.3.2.post1
starlette 0.27.0
starsessions 1.3.0
sympy 1.12
tensorboard 2.14.1
tensorboard-data-server 0.7.1
tensorflow 2.14.0
tensorflow-estimator 2.14.0
tensorflow-intel 2.14.0
tensorflow-io-gcs-filesystem 0.31.0
termcolor 2.3.0
tokenizers 0.13.3
toolz 0.12.0
torch 2.0.1
torchaudio 2.0.2+cu117
torchmetrics 0.11.4
torchvision 0.15.2+cu117
tqdm 4.66.1
traitlets 5.10.0
transformers 4.33.2
typing_extensions 4.8.0
tzdata 2023.3
ujson 5.8.0
unicode-slugify 0.1.5
Unidecode 1.3.4
urllib3 1.26.12
uvicorn 0.23.2
watchfiles 0.20.0
wcwidth 0.2.6
websocket-client 1.6.3
websockets 11.0.3
Werkzeug 2.3.6
wheel 0.40.0
wrapt 1.14.1
xxhash 3.3.0
yarl 1.9.2

from midi-model.

SkyTNT commented on May 29, 2024

Your torch is not cuda version.
Please install cuda version via pip install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

from midi-model.

SkyTNT commented on May 29, 2024

Maybe you also need to update pytorch-lightning via pip install -U pytorch-lightning

from midi-model.

Epictyphlosion commented on May 29, 2024

It got past the accelerator checking, and now I get another error as usual...

---start train---
[rank: 0] Seed set to 0
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [license.piriform.com]:61321 (system error: 10049 - The requested address is not valid in its context.).
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [license.piriform.com]:61321 (system error: 10049 - The requested address is not valid in its context.).
Traceback (most recent call last):
File "C:\Users\MainUser\Downloads\midi-generator-app-gpu\train.py", line 369, in
trainer.fit(model, train_dataloader, val_dataloader, ckpt_path=ckpt_path)
File "C:\Python\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "C:\Python\lib\site-packages\pytorch_lightning\trainer\call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "C:\Python\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 102, in launch
return function(*args, **kwargs)
File "C:\Python\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "C:\Python\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 947, in _run
self.strategy.setup_environment()
File "C:\Python\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 147, in setup_environment
self.setup_distributed()
File "C:\Python\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 198, in setup_distributed
_init_dist_connection(self.cluster_environment, self._process_group_backend, timeout=self._timeout)
File "C:\Python\lib\site-packages\lightning_fabric\utilities\distributed.py", line 290, in _init_dist_connection
torch.distributed.init_process_group(torch_distributed_backend, rank=global_rank, world_size=world_size, **kwargs)
File "C:\Python\lib\site-packages\torch\distributed\distributed_c10d.py", line 907, in init_process_group
default_pg = _new_process_group_helper(
File "C:\Python\lib\site-packages\torch\distributed\distributed_c10d.py", line 1013, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in

from midi-model.

SkyTNT commented on May 29, 2024

You need to install https://developer.nvidia.com/nccl

from midi-model.

Epictyphlosion commented on May 29, 2024

None of the installers are for Windows

from midi-model.

SkyTNT commented on May 29, 2024

you can search on Google
RuntimeError: Distributed package doesn't have NCCL built in

I can't help you anymore because I don't have time

from midi-model.

Error when training about midi-model HOT 23 CLOSED

Comments (23)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent