Comments (23)
the default training command is what i'm using.
dataset is https://huggingface.co/datasets/projectlosangeles/Los-Angeles-MIDI-Dataset
from midi-model.
- You cannot use onnx file to resume train or load from onnx file. What you need is ckpt file.
- The data set is divided into a training set and a validation set. The sample size in the validation set is specified by --data-val-split (default 128), and the remaining samples are the training set samples.
- The number of midi files is too small to train at all, it must be at least 1k.
- Are you sure you have enough video memory to support such a high batch-size? I have 24g but can only use 2 batch-size.
from midi-model.
I have an RTX 2060, not sure if that's enough. Also I forgot about the ckpt version of the midi model, thanks for reminding me lol.
I feel at least a basic tutorial on training would help.
from midi-model.
Now pytorch is giving me a MisconfigurationException: No supported gpu backend found!
error. This appears to be a known error with lightning, but downgrading to version 1.7.7 doesn't fix anything like it did for someone else.
Also, I'm not sure if the --accelerator
option does anything because of the accelerator="gpu",
on line 356. I get the same error even when I tell it to use my CPU.
from midi-model.
Are you installed pytorch with cuda support?
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
from midi-model.
Yes, I did that awhile before writing those replies.
from midi-model.
Please confirm what version you have installed.
pip show torch
from midi-model.
I have version 2.0.1.
from midi-model.
Have you installed CUDA?
from midi-model.
Yes, I have torch with cuda 11.7 as well as cuda-python 3.0.2.
from midi-model.
@Epictyphlosion can you share your dataset training? i have the same issue
from midi-model.
Yes, I have torch with cuda 11.7 as well as cuda-python 3.0.2.
I mean nvidia cuda toolkit, not cuda-python.
from midi-model.
@SkyTNT can u share a correct use of command to training?
or example a dataset training
from midi-model.
I mean nvidia cuda toolkit, not cuda-python.
I didn't have that. I've installed it from this link: https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda_12.2.2_537.13_windows.exe
from midi-model.
the default training command is what i'm using.
I can't find the default command anywhere. I'm still getting the same "no supported GPU backend" error even after installing cuda toolkit.
from midi-model.
Default command is python train.py
.
And please provide the list of installed packages on your environment via pip list
.
from midi-model.
That's the command I was using.
My packages are as follows
absl-py 1.4.0
aiobotocore 2.5.4
aiofiles 23.2.1
aiohttp 3.8.5
aioitertools 0.11.0
aiosignal 1.3.1
altair 5.1.1
annotated-types 0.5.0
ansicon 1.89.0
anyio 3.7.1
arrow 1.2.3
astunparse 1.6.3
async-timeout 4.0.3
attrs 23.1.0
backoff 2.2.1
bandcamp-downloader 0.0.13
beautifulsoup4 4.11.1
blessed 1.20.0
botocore 1.31.17
cachetools 5.3.1
certifi 2022.6.15
chardet 5.0.0
charset-normalizer 2.1.1
click 8.1.7
colorama 0.4.6
coloredlogs 15.0.1
contourpy 1.1.1
croniter 1.3.15
cuda-python 12.2.0
cycler 0.11.0
Cython 3.0.2
datasets 2.14.5
dateutils 0.6.12
deepdiff 6.5.0
demjson3 3.0.5
dill 0.3.7
dnspython 2.4.2
docopt 0.6.2
email-validator 2.0.0.post2
exceptiongroup 1.1.3
fastapi 0.103.1
ffmpy 0.3.1
filelock 3.12.4
fire 0.5.0
flatbuffers 23.5.26
fonttools 4.42.1
frozenlist 1.4.0
fsspec 2023.6.0
gast 0.5.4
google-auth 2.22.0
google-auth-oauthlib 1.0.0
google-pasta 0.2.0
gradio 3.44.4
gradio_client 0.5.1
grpcio 1.56.0
h11 0.14.0
h5py 3.9.0
httpcore 0.18.0
httptools 0.6.0
httpx 0.25.0
huggingface-hub 0.17.2
humanfriendly 10.0
idna 3.3
importlib-resources 6.1.0
inquirer 3.1.3
itsdangerous 2.1.2
Jinja2 3.1.2
jinxed 1.2.0
jmespath 1.0.1
jsonschema 4.19.1
jsonschema-specifications 2023.7.1
keras 2.14.0
kiwisolver 1.4.5
libclang 16.0.6
lightning 2022.9.22
lightning-app 0.6.2
lightning-cloud 0.5.7
lightning-utilities 0.3.0
lxml 4.9.1
Markdown 3.4.3
markdown-it-py 3.0.0
MarkupSafe 2.1.3
matplotlib 3.8.0
mdurl 0.1.2
ml-dtypes 0.2.0
mock 4.0.3
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.15
mutagen 1.45.1
networkx 3.1
numpy 1.25.1
oauthlib 3.2.2
opt-einsum 3.3.0
optimum 1.13.2
ordered-set 4.1.0
orjson 3.9.7
packaging 23.1
pandas 2.1.1
Pillow 10.0.1
pip 23.2.1
protobuf 4.23.4
psutil 5.9.5
pyarrow 13.0.0
pyasn1 0.5.0
pyasn1-modules 0.3.0
pydantic 2.1.1
pydantic_core 2.4.0
pydantic-extra-types 2.1.0
pydantic-settings 2.0.3
pyDeprecate 0.3.2
pydub 0.25.1
pyFluidSynth 1.3.2
Pygments 2.16.1
PyJWT 2.8.0
pyparsing 3.1.1
pyreadline3 3.4.1
python-dateutil 2.8.2
python-dotenv 1.0.0
python-editor 1.0.4
python-multipart 0.0.6
pytorch-lightning 1.7.7
pytz 2023.3.post1
PyYAML 6.0.1
readchar 4.0.5
referencing 0.30.2
regex 2023.8.8
requests 2.28.1
requests-oauthlib 1.3.1
rich 13.5.3
rpds-py 0.10.3
rsa 4.9
s3fs 2023.6.0
safetensors 0.3.3
semantic-version 2.10.0
sentencepiece 0.1.99
setuptools 63.2.0
six 1.16.0
sniffio 1.3.0
soupsieve 2.3.2.post1
starlette 0.27.0
starsessions 1.3.0
sympy 1.12
tensorboard 2.14.1
tensorboard-data-server 0.7.1
tensorflow 2.14.0
tensorflow-estimator 2.14.0
tensorflow-intel 2.14.0
tensorflow-io-gcs-filesystem 0.31.0
termcolor 2.3.0
tokenizers 0.13.3
toolz 0.12.0
torch 2.0.1
torchaudio 2.0.2+cu117
torchmetrics 0.11.4
torchvision 0.15.2+cu117
tqdm 4.66.1
traitlets 5.10.0
transformers 4.33.2
typing_extensions 4.8.0
tzdata 2023.3
ujson 5.8.0
unicode-slugify 0.1.5
Unidecode 1.3.4
urllib3 1.26.12
uvicorn 0.23.2
watchfiles 0.20.0
wcwidth 0.2.6
websocket-client 1.6.3
websockets 11.0.3
Werkzeug 2.3.6
wheel 0.40.0
wrapt 1.14.1
xxhash 3.3.0
yarl 1.9.2
from midi-model.
Your torch is not cuda version.
Please install cuda version via pip install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
from midi-model.
Maybe you also need to update pytorch-lightning via pip install -U pytorch-lightning
from midi-model.
It got past the accelerator checking, and now I get another error as usual...
---start train---
[rank: 0] Seed set to 0
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [license.piriform.com]:61321 (system error: 10049 - The requested address is not valid in its context.).
[W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [license.piriform.com]:61321 (system error: 10049 - The requested address is not valid in its context.).
Traceback (most recent call last):
File "C:\Users\MainUser\Downloads\midi-generator-app-gpu\train.py", line 369, in
trainer.fit(model, train_dataloader, val_dataloader, ckpt_path=ckpt_path)
File "C:\Python\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "C:\Python\lib\site-packages\pytorch_lightning\trainer\call.py", line 43, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "C:\Python\lib\site-packages\pytorch_lightning\strategies\launchers\subprocess_script.py", line 102, in launch
return function(*args, **kwargs)
File "C:\Python\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "C:\Python\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 947, in _run
self.strategy.setup_environment()
File "C:\Python\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 147, in setup_environment
self.setup_distributed()
File "C:\Python\lib\site-packages\pytorch_lightning\strategies\ddp.py", line 198, in setup_distributed
_init_dist_connection(self.cluster_environment, self._process_group_backend, timeout=self._timeout)
File "C:\Python\lib\site-packages\lightning_fabric\utilities\distributed.py", line 290, in _init_dist_connection
torch.distributed.init_process_group(torch_distributed_backend, rank=global_rank, world_size=world_size, **kwargs)
File "C:\Python\lib\site-packages\torch\distributed\distributed_c10d.py", line 907, in init_process_group
default_pg = _new_process_group_helper(
File "C:\Python\lib\site-packages\torch\distributed\distributed_c10d.py", line 1013, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
from midi-model.
You need to install https://developer.nvidia.com/nccl
from midi-model.
None of the installers are for Windows
from midi-model.
you can search on Google
RuntimeError: Distributed package doesn't have NCCL built in
I can't help you anymore because I don't have time
from midi-model.
Related Issues (14)
- Error HOT 1
- Issue
- error HOT 4
- Convert Pytorch Lightning model to Huggingface model issues HOT 5
- Custom MIDI option ? HOT 11
- Thank you again! :) HOT 12
- RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 HOT 18
- Colab fails (possibly need to downgrade gradio?) HOT 4
- 关于编码问题想请教一下 HOT 8
- Fine tuning notebook HOT 1
- Error while training HOT 1
- Train.py - ValueError: Transformers now supports natively BetterTransformer optimizations HOT 2
- Issue with shm.dll of torch moduel
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from midi-model.