harboryuan / ovsam Goto Github PK

View Code? Open in Web Editor NEW

896.0 896.0 27.0 1.6 MB

[ECCV 2024] The official code of paper "Open-Vocabulary SAM".

Home Page: https://www.mmlab-ntu.com/project/ovsam

License: Other

Python 98.40% Shell 0.17% Cuda 1.31% C++ 0.12%

ovsam's People

Contributors

Stargazers

Watchers

ovsam's Issues

Why don't you use clip-ViT as the backbone?

As the title, thank you for your attention and help !

hao to Inference self datasts

Hi！Solved the env problem,i want to Inference self dataset,
you code used cocodatas, but in other data hao to test, example(img，segmention label png), when extract the language embeddings is in coco Clases, Will it affect use on other datasets, Can you explain in detail how to test inference on other datasets. thank you

git mmcv

git+https://github.com/open-mmlab/mmcv.git@4f65f91db6502d990ce2ee5de0337441fb69dd10 Not Found

huggingface space failed

Hey, the huggingface space fails to work.

Could you provide a detailed environment configuration example?Does cuda have to be 12.1？

According to your README, you have installed CUDA 12.1, but according to this website https://mmcv.readthedocs.io/en/latest/get_started/installation.html#install-with-pip , I should install PyTorch version 2.1.0 and mmcv 2.1.0. However, it seems that it is not meeting the requirement "Please install mmcv>=2.0.0, <2.1.0." Can you please tell me the correct environment configuration requirements? Thank you！！

When I execute "bash tools/dist.sh test seg/configs/sam2clip/sam_vith_dump.py 1", I get this error.

Traceback (most recent call last):
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/mmengine/config/lazy.py", line 68, in build
module = importlib.import_module(self._module)
File "/root/miniconda3/envs/ovsam/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/workspace/ovsam/seg/models/detectors/init.py", line 1, in
from .sam2clip_distill import BackboneDistillation
File "/workspace/ovsam/seg/models/detectors/sam2clip_distill.py", line 6, in
from mmdet.models.detectors.base import ForwardResults
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/mmdet/models/init.py", line 3, in
from .data_preprocessors import * # noqa: F401,F403
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/mmdet/models/data_preprocessors/init.py", line 6, in
from .reid_data_preprocessor import ReIDDataPreprocessor
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/mmdet/models/data_preprocessors/reid_data_preprocessor.py", line 13, in
import mmpretrain
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/mmpretrain/init.py", line 18, in
and mmcv_version < digit_version(mmcv_maximum_version)),
AssertionError: MMCV==2.1.0 is used but incompatible. Please install mmcv>=2.0.0, <2.1.0.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/workspace/ovsam/tools/test.py", line 177, in
main()
File "/workspace/ovsam/tools/test.py", line 141, in main
runner = Runner.from_cfg(cfg)
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/mmengine/runner/runner.py", line 445, in from_cfg
runner = cls(
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/mmengine/runner/runner.py", line 412, in init
self.model = self.build_model(model)
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/mmengine/runner/runner.py", line 819, in build_model
model = MODELS.build(model)
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 232, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 96, in build_from_cfg
obj_type = args.pop('type')
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/mmengine/config/config.py", line 182, in pop
return self.build_lazy(super().pop(key, default))
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/mmengine/config/config.py", line 215, in build_lazy
value = value.build()
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/mmengine/config/lazy.py", line 70, in build
raise type(e)(f'Failed to import {self._module} '
AssertionError: Failed to import seg.models.detectors in seg/configs/sam2clip/sam_vith_dump.py, line 5 for MMCV==2.1.0 is used but incompatible. Please install mmcv>=2.0.0, <2.1.0.
[2024-07-05 10:13:04,386] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 40539) of binary: /root/miniconda3/envs/ovsam/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/ovsam/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==2.1.0', 'console_scripts', 'torchrun')())
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/ovsam/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

about "Feature-Crop baseline"

Hello, I would like to ask how the "Feature-Crop baseline" mentioned in the paper crops features using a mask? Is there any specific paper that I can refer to?

Where can I get the sam dataset

I downloaded the SA-1B dataset, but the website says NOTE: There are no class labels for the images or mask annotations.
After downloading and decompressing, it is not a json file. Where can I download the SAM dataset of your project

Does OV-SAM support segment anything mode?

Thanks for you excellent work!

As a SAM variant, does OV-SAM support segment everything(with class) mode in SAM?
Just like SamAutomaticMaskGenerator in sam

text embedding

Hello author,

I want to ask where is the text embedding extration file? How do you process the text dataset? Thanks!

Prompt compare SAM

OVSAM prompt is point and box,Is it any different compare with SAM? in train or infer, hao to get prompt? if test on images, mast get prompt by det or assign a bbox or rect?Can you make predictions by generating dense boxes

how to inference on my custom data?

i want to test the inference results on my data, how to modify the code or config?

Object level masks

Hi @HarborYuan,

Thank you for the great work! You mentioned in the paper that you are concentrating only object level masks using segment anything model.

What was done specifically to get object level masks only and avoid part masks?
What size of the point grid was used to train the model? Was it different from 32x32 original SAM grid?
Where can I find the list of all classes list? is it possible to restrict the amount of classes to what I need?

Thank you!

Reduce Train Batch Size

Hi! I want to reduce train batch size from 2 to 1. How can I do it?
I'm looking forward to your early reply. Thanks!

The key argument of `Registry.get` must be a str

when i run the inference command, there is a error, how can i solve it?

Traceback (most recent call last):
File "/maggie.meng/code/ovsam/tools/test.py", line 177, in
main()
File "/maggie.meng/code/ovsam/tools/test.py", line 141, in main
runner = Runner.from_cfg(cfg)
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/runner/runner.py", line 445, in from_cfg
runner = cls(
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/runner/runner.py", line 412, in init
self.model = self.build_model(model)
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/runner/runner.py", line 819, in build_model
Traceback (most recent call last):
File "/maggie.meng/code/ovsam/tools/test.py", line 177, in
model = MODELS.build(model)
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
main()
File "/maggie.meng/code/ovsam/tools/test.py", line 141, in main
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 232, in build_model_from_cfg
runner = Runner.from_cfg(cfg)
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/runner/runner.py", line 445, in from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File "/maggie.meng/code/ovsam/seg/models/detectors/ovsam.py", line 63, in init
runner = cls(
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/runner/runner.py", line 412, in init
self.neck = MODELS.build(neck)
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
self.model = self.build_model(model)
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/runner/runner.py", line 819, in build_model
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 232, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
model = MODELS.build(model) File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg

File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
obj = obj_cls(**args) # type: ignore
File "/maggie.meng/code/ovsam/seg/models/necks/transformer_neck.py", line 43, in init
patch_embed = PatchEmbed(
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmdet/models/layers/transformer/utils.py", line 250, in init
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 232, in build_model_from_cfg
self.projection = build_conv_layer(
return build_from_cfg(cfg, registry, default_args) File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmcv/cnn/bricks/conv.py", line 43, in build_conv_layer

File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
conv_layer = registry.get(layer_type)
obj = obj_cls(**args) # type: ignore File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/registry/registry.py", line 441, in get

File "/maggie.meng/code/ovsam/seg/models/detectors/ovsam.py", line 63, in init
self.neck = MODELS.build(neck)
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
raise TypeError(
TypeError: The key argument of `Registry.get` must be a str, got <class 'type'>
return self.build_func(cfg, args, kwargs, registry=self)
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 232, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
obj = obj_cls(args) # type: ignore
File "/maggie.meng/code/ovsam/seg/models/necks/transformer_neck.py", line 43, in init
patch_embed = PatchEmbed(
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmdet/models/layers/transformer/utils.py", line 250, in init
self.projection = build_conv_layer(
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmcv/cnn/bricks/conv.py", line 43, in build_conv_layer
conv_layer = registry.get(layer_type)
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/mmengine/registry/registry.py", line 441, in get
raise TypeError(
TypeError: The key argument of `Registry.get` must be a str, got <class 'type'>
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 523118) of binary: /root/anaconda3/envs/ovsam_demo/bin/python
Traceback (most recent call last):
File "/root/anaconda3/envs/ovsam_demo/bin/torchrun", line 8, in
sys.exit(main())
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(args, kwargs)
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/torch/distributed/run.py", line 761, in main
run(args)
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 131, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/anaconda3/envs/ovsam_demo/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

tools/test.py FAILED

Failures:
[1]:
time : 2024-08-27_16:34:49
host : 9rqdhjcat3fsm-0
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 523119)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-08-27_16:34:49
host : 9rqdhjcat3fsm-0
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 523118)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Does it support full segmentation like SAM?

Thank you very much for your work, can it work like the SAM full segmentation process?

How to recognize 22,000 classes?

Hi, thank you for your valuable contribution!

I appreciate your work on the ovsam model. In your paper, you mentioned that the model can currently segment and recognize around 22,000 classes. However, when I tested the example provided in the demo, it appears that only approximately 1,000 classes can be recognized. I noticed that the names field is defined in this file.

Could you please clarify whether my understanding is correct? If I have misunderstood, kindly point out the correct information. Thank you very much for your clarification.

segmentation for sky is error

I tried both point and box prompt for sample images, but both failed for sky segmentation.

I deploy the demo following the instruments.
Is there anything i missed?

Segmentation Problems with Multiple Objects of the Same Kind

Hi, I want to know if I want to segment out class A objects. If there is more than one class A object in the image, then by entering the class A name, will all the objects be segmented out.
Looking forward to your reply

How to add prompt?

I try the demo in huggingface, but it only support to click some place in picture to do inference, can it support add some text prompt instead of click on picture?

How to use demo offline?

novel_score is very low (checkpoint evalutation on test)

Hi. Thanks for the code and paper.

When I evaluate the provided checkpoint with the codebase, I am able to reproduce all the COCO values that were reported in the paper.

But I have a question about the values printed in the terminal. Is the novel_score the accuracy on class prediction for the novel classes? Why is novel_score so low compared to base_score?

mmengine - INFO - Epoch(test) [1209/1209]    miou: 0.6791  base_iou: 0.6835  novel_iou: 0.6521 
                                             score: 76.7359  base_score: 87.4120  novel_score: 11.1742 
                                             data_time: 0.0165  time: 0.2565

Thanks.

Missing ovsam.py File - Upload Inquiry

Hello,

I encountered a ModuleNotFoundError related to the absence of the ovsam.py file. Could you please confirm if it will be uploaded?

Error Details:

ModuleNotFoundError: Failed to import seg.models.detectors in seg/configs/ovsam/ovsam_coco_rn50x16_point.py, line 7 for No module named 'seg.models.detectors.ovsam'

Thank you!

Something wrong with the Ilustration?

It seems like to be like this?

No such file or directory: '/root/.cache/embd/RN50x16_CocoOVDataset.pth'

No such file or directory: '/root/.cache/embd/RN50x16_CocoOVDataset.pth', how to download this model?

Problem regarding to creating the environment

After conda install pytorch torchvision torchaudio cuda-toolkit pytorch-cuda==12.1 -c pytorch -c "nvidia/label/cuda-12.1.0", there will be an error like:

Could not solve for environment specs
The following package could not be installed
└─ pytorch-cuda 12.1 is not installable because it requires
└─ libnvjitlink >=12.1.105,<12.2.0 , which does not exist (perhaps a missing channel).

Can you help me with this? Thank you so much.

RuntimeError: GET was unable to find an engine to execute this computation

I prepare envirement and want to Inference,but run bash tools/dist.sh test seg/configs/ovsam/ovsam_coco_rn50x16_point.py 8 happend GRT false,
its: File "/mnt/NewDataShare/D4/common/wbzhou/MLLM/ovsam/seg/models/data_preprocessor/ovsam_preprocessor.py", line 193, in forward
gt_instances.point_coords = get_center_coords(
File "/mnt/NewDataShare/D4/common/wbzhou/MLLM/ovsam/seg/models/data_preprocessor/ovsam_preprocessor.py", line 24, in

Is this issue due to incorrect environment configuration?
torch==2.1.2+cu121 torchvison==0.16.2+cu121 mmcv==2.1.0 mmdet==3.3.0

how to predict iou in ovsam?

Hello, I wonder how to predict iou in ovsam?

The paper states that there are three tokens, including iou、label、mask tokens, but the weights of iou_token is not found in the model('clip2sam_coco_rn50x16.pth'). There are only two token(mask 、label token) weights in the model('clip2sam_coco_rn50x16.pth').
Besides, by comparing the code of sam decoder, I found that you replaced the original iou_token position with label_token. When I obtain the iou_token, how do I predict the iou in the code?

My questions are as follows:

Where can I get iou_token weights?
When there are iou_token weights, how to implement the code for iou prediction?

How $Q_{label}$ is updated?

Hi. As mentioned in your paper, $Q_{label}$ is the key to CLIP2SAM. I noticed that $Q_{label}$ is a learnable token, am I right? And the paper metioned that: 'The final labels are derived by calculating the distance between the refined label token and the CLIP text embedding, as in Equ. (1)'. It means $Q_{label}$ is aligned with text embeddings, and then get the class label through cosine similarity. However, I found that in your code, the roi embeddings is not include Q, as follows,

ovsam/seg/models/heads/ovsam_head.py

Line 219 in 137d2c2

roi_feats = roi_feats[:, None] + 0 * cls_embed

So where does $Q_{label}$ get the gradient for updating? This confuses me. Looking forward to your reply. Thank you in advance!

_pickle.UnpicklingError: invalid load key, 'v'.

Hello,

When I attempt to execute the test case using the following command:

python tools/test.py seg/configs/ovsam/ovsam_coco_rn50x16_point.py

I encountered the following error. Could you please guide me on how to resolve it? Any assistance would be greatly appreciated.

Error Details:

Traceback (most recent call last):
  File "tools/test.py", line 177, in <module>
    main()
  File "tools/test.py", line 141, in main
    runner = Runner.from_cfg(cfg)
  ...
  ...
  ...
  _pickle.UnpicklingError: invalid load key, 'v'.

Thank you!

Label token is useless?

Thank you for your great work! I found the label token in OVSAMHead seems to be useless because the cls_embed is multiplied with 0 like below.

ovsam/seg/models/heads/ovsam_head.py

Line 219 in 62c8ab5

roi_feats = roi_feats[:, None] + 0 * cls_embed

Do I understand right? Thank you!

Minimum Cuda Memory Required for SAM2CLIP Training

Thank you for all your hard work!
I encountered a "Cuda out of memory" error when running "bash tools/dist.sh train seg/configs/sam2clip/sam2clip_vith_rn50x16.py 8" on 8 RTX2080Ti GPUs to train SAM2CLIP. So, what is the minimum cuda memory required for SAM2CLIP training?
I‘m looking forward to your reply.

How can I evaluate on LVIS dataset?

Thanks for your sharing of your excellent work!

As described in the title, can you provide some guidance for me to validate OVSAM on LVIS dataset.

time

How long does training and inference take respectively？

FileNotFoundError

Hi, could you please provide the download link for RN50x16_CocoOVDataset.pth? I couldn't find the relevant download link.

about mmdet 2.0

so the model can change some code to use ind mmdet 2.0?

Can not open online demo

seem like model sha256 doesn't match?

Got a problem when adding "--show" and "--show-dir" when running test.py

Hi,

I got this error when I add --show and --show-dir as script parameters. Have you encountered this problem before? Can you please help me?AttributeError: 'InstanceData' object has no attribute 'scores'

the version for mmcv, mmdet and other packages are listed as below.

Evaluation scripts on LVIS dataset

          Hi @HarborYuan,

Can I know when I can see the test scripts on LVIS dataset? Five days had flown by since you replied last time, and 3 weeks had flown by since this issue was created.

Hope for your updates!

Originally posted by @Dyb3438 in #34 (comment)

FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/.cache/embd/RN50x16_CocoOVDataset.pth'

Hi, thank you for your hard work!
I run your code but it raises an error:
"FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/.cache/embd/RN50x16_CocoOVDataset.pth'"
Could you tell me how to solve this problem?
I'm looking forward to your reply.

Completed

          You need to modify [this](https://github.com/HarborYuan/ovsam/blob/1d4dfb287fe113e8ecd60f76b4385c5506f566ca/seg/configs/ovsam/ovsam_coco_rn50x16_point.py#L13) ([file](https://github.com/HarborYuan/ovsam/blob/1d4dfb287fe113e8ecd60f76b4385c5506f566ca/seg/configs/_base_/datasets/coco_ov_instance_lsj.py)) config file to support more dataset.

To write such a config, you may need to write a new dataset class starting from COCO and import it to your config.

Originally posted by @HarborYuan in #27 (comment)

harboryuan / ovsam Goto Github PK

ovsam's People

Contributors

Stargazers

Watchers

Forkers

ovsam's Issues

tools/test.py FAILED

Failures: [1]: time : 2024-08-27_16:34:49 host : 9rqdhjcat3fsm-0 rank : 1 (local_rank: 1) exitcode : 1 (pid: 523119) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure): [0]: time : 2024-08-27_16:34:49 host : 9rqdhjcat3fsm-0 rank : 0 (local_rank: 0) exitcode : 1 (pid: 523118) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Recommend Projects

Recommend Topics

Recommend Org

Failures:
[1]:
time : 2024-08-27_16:34:49
host : 9rqdhjcat3fsm-0
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 523119)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-08-27_16:34:49
host : 9rqdhjcat3fsm-0
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 523118)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html