Error: "RuntimeError: shape '[1, 0, 2]' is invalid for input of size 15319040" about text_search HOT 4 OPEN

npovey commented on May 27, 2024 2

Error: "RuntimeError: shape '[1, 0, 2]' is invalid for input of size 15319040"

from text_search.

Comments (4)

pkufool commented on May 27, 2024 1

I think I can reproduce this issue with your envs.

Here is my logs:

python tools/recognize.py --world-s
ize 1 --manifest-in data/manifests/librilight_chunk_cuts_small.jsonl.gz --manifest-out librilight_cuts_test2.jsonl.gz --nn-model-filename exp/exp/jit_script.pt
 --tokens exp/data/lang_bpe_500/tokens.txt                                                                                                                     
2023-08-31 15:27:31,194 INFO [recognize.py:323] Decoding started                                                                                               
2023-08-31 15:27:31,197 INFO [recognize.py:336] {'subsampling_factor': 4, 'frame_shift_ms': 10, 'beam_size': 4, 'world_size': 1, 'master_port': 12354, 'manifes
t_in': PosixPath('data/manifests/librilight_chunk_cuts_small.jsonl.gz'), 'manifest_out': PosixPath('librilight_cuts_test2.jsonl.gz'), 'log_dir': PosixPath('log
s'), 'nn_model_filename': 'exp/exp/jit_script.pt', 'tokens': 'exp/data/lang_bpe_500/tokens.txt', 'decoding_method': 'greedy_search', 'max_duration': 600.0, 're
turn_cuts': True, 'num_mel_bins': 80, 'num_workers': 8, 'manifest_out_dir': PosixPath('.'), 'suffix': '.jsonl.gz', 'cuts_filename': 'librilight_cuts_test2', 'b
lank_id': 0, 'unk_id': 2, 'vocab_size': 500}                                                                                                                   
2023-08-31 15:27:31,268 INFO [recognize.py:341] device: cuda:0                                                                                                 
2023-08-31 15:27:31,269 INFO [recognize.py:343] Loading jit model                                                                                              
2023-08-31 15:27:44,860 INFO [recognize.py:299] cuts processed until now is 20                                                                                 
/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: UserWarning: operator() profile_node %626 : int = prim::profile_ivalue(%624)
 does not have profile information (Triggered internally at ../third_party/nvfuser/csrc/graph_fuser.cpp:104.)
  return forward_call(*args, **kwargs)
/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: UserWarning: FALLBACK path has been taken in$ide: compileCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable `export PYTORCH_NVFUSER_DISABLE=fallback`
To report the issue, try enable logging via setting the envvariable ` export PYTORCH_JIT_LOG_LEVEL=manager.cpp`
 (Triggered internally at ../third_party/nvfuser/csrc/manager.cpp:243.)
  return forward_call(*args, **kwargs)
/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: UserWarning: FALLBACK path has been taken inside: runCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable `export PYTORCH_NVFUSER_DISABLE=fallback`
 (Triggered internally at ../third_party/nvfuser/csrc/manager.cpp:335.)
  return forward_call(*args, **kwargs)
Traceback (most recent call last):
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 433, in <module>
    main()
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 426, in main
    run(rank=0, world_size=world_size, args=args, in_cuts=in_cuts)
  File "/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)                                                
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 365, in run
    decode_dataset(                     
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 287, in decode_dataset
    hyps, timestamps, scores = decode_one_batch(                                  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 184, in decode_one_batch
    encoder_out, encoder_out_lens = model.encoder(                              
  File "/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):                               
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):                               
RuntimeError: shape '[1, 0, 2]' is invalid for input of size 7659520

When I set the export PYTORCH_NVFUSER_DISABLE=fallback the logs became:

python tools/recognize.py --world-size 1 --manifest-in data/manifests/librilight_chunk_cuts_small.jsonl.gz --manifest-out librilight_cuts_test2.jsonl.gz --nn-model-filename exp/exp/jit_script.pt --tokens exp/data/lang_bpe_500/tokens.txt 
2023-08-31 15:33:15,659 INFO [recognize.py:323] Decoding started
2023-08-31 15:33:15,663 INFO [recognize.py:336] {'subsampling_factor': 4, 'frame_shift_ms': 10, 'beam_size': 4, 'world_size': 1, 'master_port': 12354, 'manifest_in': PosixPath('data/manifests/librilight_chunk_cuts_small.jsonl.gz'), 'manifest_out': PosixPath('librilight_cuts_test2.jsonl.gz'), 'log_dir': PosixPath('logs'), 'nn_model_filename': 'exp/exp/jit_script.pt', 'tokens': 'exp/data/lang_bpe_500/tokens.txt', 'decoding_method': 'greedy_search', 'max_duration': 600.0, 'return_cuts': True, 'num_mel_bins': 80, 'num_workers': 8, 'manifest_out_dir': PosixPath('.'), 'suffix': '.jsonl.gz', 'cuts_filename': 'librilight_cuts_test2', 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500}
2023-08-31 15:33:15,737 INFO [recognize.py:341] device: cuda:0
2023-08-31 15:33:15,737 INFO [recognize.py:343] Loading jit model
2023-08-31 15:33:29,322 INFO [recognize.py:299] cuts processed until now is 20
/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: UserWarning: operator() profile_node %626 : int = prim::profile_ivalue(%624)
 does not have profile information (Triggered internally at ../third_party/nvfuser/csrc/graph_fuser.cpp:104.)
  return forward_call(*args, **kwargs)
Traceback (most recent call last):
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 433, in <module>
    main()
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 426, in main
    run(rank=0, world_size=world_size, args=args, in_cuts=in_cuts)
  File "/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 365, in run
    decode_dataset(
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 287, in decode_dataset
    hyps, timestamps, scores = decode_one_batch(
  File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 184, in decode_one_batch
    encoder_out, encoder_out_lens = model.encoder(
  File "/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
RuntimeError: dims.value().size() == self->getMaybeRFactorDomain().size() INTERNAL ASSERT FAILED at "../third_party/nvfuser/csrc/parser.cpp":3399, please report a bug to PyTorch.

I think it is a bug in the new version of pytorch.

from text_search.

danpovey commented on May 27, 2024

maybe we can some figure out what op it was doing, to work around it? it's a shame if we can't inference our models in pytorch 2.0.1.

from text_search.

npovey commented on May 27, 2024

Not sure if it is relevant I use the same env to train icefall models. I was able to use an Icefall recipe and train a model for 150 epochs using my pytorch 2.0.1. env that is given on top. I am getting an error only when using k2fsa:text_search
https://github.com/k2-fsa/text_search/blob/master/examples/libriheavy/run.sh script. stage 3 as described above.

from text_search.

pkufool commented on May 27, 2024

maybe we can some figure out what op it was doing, to work around it? it's a shame if we can't inference our models in pytorch 2.0.1.

I can't see any stacks, but I think we can first try exporting the model with pythorch 2.0.1.

from text_search.

Error: "RuntimeError: shape '[1, 0, 2]' is invalid for input of size 15319040" about text_search HOT 4 OPEN

Comments (4)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent