The following error occurs when running lm-eval with accelerate, with a sharded RWKV model
# ------------------------------
# Running Task : anli
# ------------------------------
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `7`
More than one GPU was found, enabling multi-GPU training.
If this was unintended please pass in `--num_processes=1`.
`--num_machines` was set to a value of `1`
`--mixed_precision` was set to a value of `'no'`
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Using RTX 4000 series which doesn't support faster communication speedups. Ensuring P2P and IB communications are disabled.
2024-02-25:19:16:47,809 INFO [utils.py:145] Note: detected 160 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-02-25:19:16:47,809 INFO [utils.py:148] Note: NumExpr detected 160 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-25:19:16:47,887 INFO [utils.py:145] Note: detected 160 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-02-25:19:16:47,887 INFO [utils.py:148] Note: NumExpr detected 160 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-25:19:16:47,897 INFO [utils.py:145] Note: detected 160 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-02-25:19:16:47,897 INFO [utils.py:148] Note: NumExpr detected 160 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-25:19:16:47,904 INFO [utils.py:145] Note: detected 160 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-02-25:19:16:47,905 INFO [utils.py:148] Note: NumExpr detected 160 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-25:19:16:47,905 INFO [utils.py:145] Note: detected 160 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-02-25:19:16:47,905 INFO [utils.py:148] Note: NumExpr detected 160 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-25:19:16:47,906 INFO [utils.py:145] Note: detected 160 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-02-25:19:16:47,906 INFO [utils.py:148] Note: NumExpr detected 160 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-25:19:16:47,937 INFO [utils.py:145] Note: detected 160 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2024-02-25:19:16:47,937 INFO [utils.py:148] Note: NumExpr detected 160 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2024-02-25:19:16:48,003 INFO [config.py:58] PyTorch version 2.1.2 available.
2024-02-25:19:16:48,088 INFO [config.py:58] PyTorch version 2.1.2 available.
2024-02-25:19:16:48,097 INFO [config.py:58] PyTorch version 2.1.2 available.
2024-02-25:19:16:48,098 INFO [config.py:58] PyTorch version 2.1.2 available.
2024-02-25:19:16:48,101 INFO [config.py:58] PyTorch version 2.1.2 available.
2024-02-25:19:16:48,102 INFO [config.py:58] PyTorch version 2.1.2 available.
2024-02-25:19:16:48,110 INFO [config.py:58] PyTorch version 2.1.2 available.
2024-02-25:19:16:49,485 INFO [__main__.py:162] Verbosity set to INFO
2024-02-25:19:16:49,485 INFO [__init__.py:358] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-02-25:19:16:49,558 INFO [__main__.py:162] Verbosity set to INFO
2024-02-25:19:16:49,559 INFO [__init__.py:358] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-02-25:19:16:49,568 INFO [__main__.py:162] Verbosity set to INFO
2024-02-25:19:16:49,568 INFO [__init__.py:358] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-02-25:19:16:49,589 INFO [__main__.py:162] Verbosity set to INFO
2024-02-25:19:16:49,589 INFO [__init__.py:358] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-02-25:19:16:49,599 INFO [__main__.py:162] Verbosity set to INFO
2024-02-25:19:16:49,599 INFO [__init__.py:358] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-02-25:19:16:49,607 INFO [__main__.py:162] Verbosity set to INFO
2024-02-25:19:16:49,607 INFO [__init__.py:358] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-02-25:19:16:49,681 INFO [__main__.py:162] Verbosity set to INFO
2024-02-25:19:16:49,[681](https://github.com/RWKV/lm-evaluation-harness/actions/runs/8040203638/job/21958032153#step:3:688) INFO [__init__.py:358] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-02-25:19:16:54,373 INFO [__main__.py:238] Selected Tasks: ['anli']
2024-02-25:19:16:54,373 INFO [__main__.py:239] Loading selected tasks...
2024-02-25:19:16:54,478 INFO [__main__.py:238] Selected Tasks: ['anli']
2024-02-25:19:16:54,478 INFO [__main__.py:239] Loading selected tasks...
2024-02-25:19:16:54,487 INFO [__main__.py:238] Selected Tasks: ['anli']
2024-02-25:19:16:54,487 INFO [__main__.py:239] Loading selected tasks...
2024-02-25:19:16:54,490 INFO [__main__.py:238] Selected Tasks: ['anli']
2024-02-25:19:16:54,490 INFO [__main__.py:239] Loading selected tasks...
2024-02-25:19:16:54,501 INFO [__main__.py:238] Selected Tasks: ['anli']
2024-02-25:19:16:54,501 INFO [__main__.py:239] Loading selected tasks...
2024-02-25:19:16:54,507 INFO [__main__.py:238] Selected Tasks: ['anli']
2024-02-25:19:16:54,507 INFO [__main__.py:239] Loading selected tasks...
exitcode : 1 (pid: 45813)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2024-02-25_19:16:59
host : 0c7aa106d18a
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 45814)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2024-02-25_19:16:59
host : 0c7aa106d18a
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 45815)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
time : 2024-02-25_19:16:59
host : 0c7aa106d18a
rank : 4 (local_rank: 4)
exitcode : 1 (pid: 45816)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
time : 2024-02-25_19:16:59
host : 0c7aa106d18a
rank : 5 (local_rank: 5)
exitcode : 1 (pid: 45817)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[6]:
time : 2024-02-25_19:16:59
host : 0c7aa106d18a
rank : 6 (local_rank: 6)
exitcode : 1 (pid: 45818)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-02-25_19:16:59
host : 0c7aa106d18a
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 45812)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================