Comments (4)
I think I can reproduce this issue with your envs.
Here is my logs:
python tools/recognize.py --world-s
ize 1 --manifest-in data/manifests/librilight_chunk_cuts_small.jsonl.gz --manifest-out librilight_cuts_test2.jsonl.gz --nn-model-filename exp/exp/jit_script.pt
--tokens exp/data/lang_bpe_500/tokens.txt
2023-08-31 15:27:31,194 INFO [recognize.py:323] Decoding started
2023-08-31 15:27:31,197 INFO [recognize.py:336] {'subsampling_factor': 4, 'frame_shift_ms': 10, 'beam_size': 4, 'world_size': 1, 'master_port': 12354, 'manifes
t_in': PosixPath('data/manifests/librilight_chunk_cuts_small.jsonl.gz'), 'manifest_out': PosixPath('librilight_cuts_test2.jsonl.gz'), 'log_dir': PosixPath('log
s'), 'nn_model_filename': 'exp/exp/jit_script.pt', 'tokens': 'exp/data/lang_bpe_500/tokens.txt', 'decoding_method': 'greedy_search', 'max_duration': 600.0, 're
turn_cuts': True, 'num_mel_bins': 80, 'num_workers': 8, 'manifest_out_dir': PosixPath('.'), 'suffix': '.jsonl.gz', 'cuts_filename': 'librilight_cuts_test2', 'b
lank_id': 0, 'unk_id': 2, 'vocab_size': 500}
2023-08-31 15:27:31,268 INFO [recognize.py:341] device: cuda:0
2023-08-31 15:27:31,269 INFO [recognize.py:343] Loading jit model
2023-08-31 15:27:44,860 INFO [recognize.py:299] cuts processed until now is 20
/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: UserWarning: operator() profile_node %626 : int = prim::profile_ivalue(%624)
does not have profile information (Triggered internally at ../third_party/nvfuser/csrc/graph_fuser.cpp:104.)
return forward_call(*args, **kwargs)
/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: UserWarning: FALLBACK path has been taken in$ide: compileCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable `export PYTORCH_NVFUSER_DISABLE=fallback`
To report the issue, try enable logging via setting the envvariable ` export PYTORCH_JIT_LOG_LEVEL=manager.cpp`
(Triggered internally at ../third_party/nvfuser/csrc/manager.cpp:243.)
return forward_call(*args, **kwargs)
/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: UserWarning: FALLBACK path has been taken inside: runCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable `export PYTORCH_NVFUSER_DISABLE=fallback`
(Triggered internally at ../third_party/nvfuser/csrc/manager.cpp:335.)
return forward_call(*args, **kwargs)
Traceback (most recent call last):
File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 433, in <module>
main()
File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 426, in main
run(rank=0, world_size=world_size, args=args, in_cuts=in_cuts)
File "/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 365, in run
decode_dataset(
File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 287, in decode_dataset
hyps, timestamps, scores = decode_one_batch( File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 184, in decode_one_batch
encoder_out, encoder_out_lens = model.encoder(
File "/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: shape '[1, 0, 2]' is invalid for input of size 7659520
When I set the export PYTORCH_NVFUSER_DISABLE=fallback
the logs became:
python tools/recognize.py --world-size 1 --manifest-in data/manifests/librilight_chunk_cuts_small.jsonl.gz --manifest-out librilight_cuts_test2.jsonl.gz --nn-model-filename exp/exp/jit_script.pt --tokens exp/data/lang_bpe_500/tokens.txt
2023-08-31 15:33:15,659 INFO [recognize.py:323] Decoding started
2023-08-31 15:33:15,663 INFO [recognize.py:336] {'subsampling_factor': 4, 'frame_shift_ms': 10, 'beam_size': 4, 'world_size': 1, 'master_port': 12354, 'manifest_in': PosixPath('data/manifests/librilight_chunk_cuts_small.jsonl.gz'), 'manifest_out': PosixPath('librilight_cuts_test2.jsonl.gz'), 'log_dir': PosixPath('logs'), 'nn_model_filename': 'exp/exp/jit_script.pt', 'tokens': 'exp/data/lang_bpe_500/tokens.txt', 'decoding_method': 'greedy_search', 'max_duration': 600.0, 'return_cuts': True, 'num_mel_bins': 80, 'num_workers': 8, 'manifest_out_dir': PosixPath('.'), 'suffix': '.jsonl.gz', 'cuts_filename': 'librilight_cuts_test2', 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500}
2023-08-31 15:33:15,737 INFO [recognize.py:341] device: cuda:0
2023-08-31 15:33:15,737 INFO [recognize.py:343] Loading jit model
2023-08-31 15:33:29,322 INFO [recognize.py:299] cuts processed until now is 20
/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: UserWarning: operator() profile_node %626 : int = prim::profile_ivalue(%624)
does not have profile information (Triggered internally at ../third_party/nvfuser/csrc/graph_fuser.cpp:104.)
return forward_call(*args, **kwargs)
Traceback (most recent call last):
File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 433, in <module>
main()
File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 426, in main
run(rank=0, world_size=world_size, args=args, in_cuts=in_cuts)
File "/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 365, in run
decode_dataset(
File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 287, in decode_dataset
hyps, timestamps, scores = decode_one_batch(
File "/star-kw/kangwei/code/text_search/examples/libriheavy/tools/recognize.py", line 184, in decode_one_batch
encoder_out, encoder_out_lens = model.encoder(
File "/star-kw/kangwei/dev_tools/anaconda/envs/textsearch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
RuntimeError: dims.value().size() == self->getMaybeRFactorDomain().size() INTERNAL ASSERT FAILED at "../third_party/nvfuser/csrc/parser.cpp":3399, please report a bug to PyTorch.
I think it is a bug in the new version of pytorch.
from text_search.
maybe we can some figure out what op it was doing, to work around it? it's a shame if we can't inference our models in pytorch 2.0.1.
from text_search.
Not sure if it is relevant I use the same env to train icefall models. I was able to use an Icefall recipe and train a model for 150 epochs using my pytorch 2.0.1. env that is given on top. I am getting an error only when using k2fsa:text_search
https://github.com/k2-fsa/text_search/blob/master/examples/libriheavy/run.sh script. stage 3 as described above.
from text_search.
maybe we can some figure out what op it was doing, to work around it? it's a shame if we can't inference our models in pytorch 2.0.1.
I can't see any stacks, but I think we can first try exporting the model with pythorch 2.0.1.
from text_search.
Related Issues (11)
- edlib HOT 3
- Early plans for the text_search HOT 1
- Idea for finding traceback quickly HOT 1
- Support renumbering in create_suffix_array HOT 1
- doc location HOT 1
- Example usage docs; update docs HOT 1
- ImportError: cannot import name 'str2bool' from 'textsearch.utils' HOT 1
- Associate multiple version of reference
- Can't download librilight-text HOT 2
- Segment durations exceed "max_duration"
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from text_search.