Comments (10)
I was able to repro this segfault on sjc-snva-t3002.
Config:
- Llama3
- decode-only
- greedy decode
![image](https://private-user-images.githubusercontent.com/114512306/345133952-a231f73b-d780-47f7-bbd6-cf1e05a0af14.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEzOTcwNDQsIm5iZiI6MTcyMTM5Njc0NCwicGF0aCI6Ii8xMTQ1MTIzMDYvMzQ1MTMzOTUyLWEyMzFmNzNiLWQ3ODAtNDdmNy1iYmQ2LWNmMWUwNWEwYWYxNC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzE5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcxOVQxMzQ1NDRaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT04MWIwOTI3MmIxMDJjNjEyMGFhNTJhNzg5MTBkYThjNGViOGU0YWMxM2JjMGU4NGY0NzM0NTE1NDZlYjY2NzcwJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.RjIy0C-wxfllqttDjMAqXNc-z6TXlAbf0-5skm1OEmg)
2024-07-02 16:45:32.806 | DEBUG | ttnn.operations.core:from_torch_and_dump:739 - Generating cache for /proj_sw/llama3-data-cache/llama3_attn_masks_decode_25_multi_device_dtype_BFLOAT16_layout_TILE.bin of shape ttnn.Shape([1, 1,│······
32, 32]), dtype BFLOAT16, layout TILE │······
--Type <RET> for more, q to quit, c to continue without paging-- │······
│······
Thread 223 "python" received signal SIGSEGV, Segmentation fault. │······
[Switching to Thread 0x7ffe82fd7700 (LWP 773289)] │······
0x00007fff88834d03 in tt::tt_metal::allocator::FreeList::deallocate(unsigned long) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │······
(gdb) bt │······
#0 0x00007fff88834d03 in tt::tt_metal::allocator::FreeList::deallocate(unsigned long) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │······
#1 0x00007fff88875f17 in tt::tt_metal::CommandQueue::run_command_impl(tt::tt_metal::CommandInterface const&) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │······
#2 0x00007fff88873b82 in tt::tt_metal::EnqueueDeallocateBuffer(tt::tt_metal::CommandQueue&, tt::tt_metal::Allocator&, unsigned int, tt::tt_metal::BufferType, bool) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │······
#3 0x00007fff89071d70 in std::__1::__function::__func<tt::tt_metal::Tensor::deallocate(bool)::$_0::operator()<tt::tt_metal::MultiDeviceStorage>(tt::tt_metal::MultiDeviceStorage&) const::{lambda(tt::tt_metal::Device*)#1}, std::__1│······
::allocator<{lambda(tt::tt_metal::Device*)#1}>, void (tt::tt_metal::Device*)>::operator()(tt::tt_metal::Device*&&) () from /home/cglagovich/tt-metal/build/lib/libtt_eager.so │······
#4 0x00007fff8907200f in std::__1::__function::__func<tt::tt_metal::Tensor::deallocate(bool)::$_0::operator()<tt::tt_metal::MultiDeviceStorage>(tt::tt_metal::MultiDeviceStorage&) const::{lambda()#1}, std::__1::allocator<{lambda()│······
#1}>, void ()>::operator()() () from /home/cglagovich/tt-metal/build/lib/libtt_eager.so │······
#5 0x00007fff8881429b in tt::WorkExecutor::run_worker() () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │······
#6 0x00007fff8881450b in void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (tt::WorkExecutor::*)(), tt::WorkEx│······
ecutor*> >(void*) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │······
#7 0x00007ffff7db5609 in start_thread (arg=<optimized out>) at pthread_create.c:477 │······
#8 0x00007ffff7eef353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 │······
(gdb) list │······
1 <built-in>: No such file or directory. │······
(gdb)
from tt-metal.
When I run with async disabled, I see a variety of errors.
2024-07-02 17:59:25.628 | DEBUG | ttnn.operations.core:from_torch_and_dump:739 - Generating cache for /proj_sw/llama3-data-cache/llama3_attn_masks_decode_109_multi_device_dtype_BFLOAT16_layout_TILE.bin of shape ttnn.Shape([1, 1│·, 32, 128]), dtype BFLOAT16, layout TILE │·
--Type <RET> for more, q to quit, c to continue without paging-- │·
│·Thread 218 "python" received signal SIGSEGV, Segmentation fault. │·
[Switching to Thread 0x7ffe857dc700 (LWP 820677)] │·
0x00007fff88833ff5 in tt::tt_metal::allocator::FreeList::update_left_aligned_allocated_block_connections(boost::local_shared_ptr<tt::tt_metal::allocator::FreeList::Block>, boost::local_shared_ptr<tt::tt_metal::allocator::FreeList:│·
:Block>) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
(gdb) bt │·
#0 0x00007fff88833ff5 in tt::tt_metal::allocator::FreeList::update_left_aligned_allocated_block_connections(boost::local_shared_ptr<tt::tt_metal::allocator::FreeList::Block>, boost::local_shared_ptr<tt::tt_metal::allocator::FreeL│·
ist::Block>) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·#1 0x00007fff8883432f in tt::tt_metal::allocator::FreeList::allocate_slice_of_free_block(boost::local_shared_ptr<tt::tt_metal::allocator::FreeList::Block>, unsigned long, unsigned long) () │· from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·#2 0x00007fff88834754 in tt::tt_metal::allocator::FreeList::allocate(unsigned long, bool, unsigned long) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
#3 0x00007fff88837075 in tt::tt_metal::allocator::BankManager::allocate_buffer(unsigned int, unsigned int, bool, tt::umd::xy_pair, std::__1::optional<unsigned int>) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·#4 0x00007fff88838f17 in tt::tt_metal::allocator::base_alloc(tt::tt_metal::AllocatorConfig const&, tt::tt_metal::allocator::BankManager&, unsigned long, unsigned long, bool, std::__1::optional<unsigned int>) () │· from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·#5 0x00007fff8883905f in tt::tt_metal::allocator::allocate_buffer(tt::tt_metal::Allocator&, unsigned int, unsigned int, tt::tt_metal::BufferType const&, bool, std::__1::optional<unsigned int>) () │·
from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·#6 0x00007fff8887398e in tt::tt_metal::EnqueueAllocateBufferImpl(tt::tt_metal::AllocBufferMetadata) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·#7 0x00007fff888760c4 in tt::tt_metal::CommandQueue::run_command_impl(tt::tt_metal::CommandInterface const&) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·#8 0x00007fff88873a85 in tt::tt_metal::EnqueueAllocateBuffer(tt::tt_metal::CommandQueue&, tt::tt_metal::Buffer*, bool, bool) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
#9 0x00007fff88822c63 in tt::tt_metal::Buffer::allocate() () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·#10 0x00007fff888215d8 in tt::tt_metal::Buffer::Buffer(tt::tt_metal::Device*, unsigned long, unsigned long, tt::tt_metal::BufferType, tt::tt_metal::TensorMemoryLayout, std::__1::optional<tt::tt_metal::ShardSpecBuffer> const&, bool│·) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·#11 0x00007fff8905c6a8 in std::__1::shared_ptr<tt::tt_metal::Buffer> std::__1::allocate_shared[abi:ue170006]<tt::tt_metal::Buffer, std::__1::allocator<tt::tt_metal::Buffer>, tt::tt_metal::Device*&, unsigned int&, unsigned int&, tt│·::tt_metal::BufferType const&, void>(std::__1::allocator<tt::tt_metal::Buffer> const&, tt::tt_metal::Device*&, unsigned int&, unsigned int&, tt::tt_metal::BufferType const&) () │· from /home/cglagovich/tt-metal/build/lib/libtt_eager.so │·#12 0x00007fff88fa6e4a in tt::tt_metal::tensor_impl::allocate_buffer_on_device(unsigned int, tt::tt_metal::Device*, tt::tt_metal::Shape const&, tt::tt_metal::DataType, tt::tt_metal::Layout, tt::tt_metal::MemoryConfig const&, std::│·__1::optional<tt::tt_metal::ShardSpecBuffer> const&) () from /home/cglagovich/tt-metal/build/lib/libtt_eager.so │·
#13 0x00007fff8906e84a in tt::tt_metal::create_device_tensor(tt::tt_metal::Shape const&, tt::tt_metal::DataType, tt::tt_metal::Layout, tt::tt_metal::Device*, tt::tt_metal::MemoryConfig const&) () │· from /home/cglagovich/tt-metal/build/lib/libtt_eager.so │·#14 0x00007fff88c37418 in tt::tt_metal::operation::generic_create_output_tensors<tt::operations::primary::Matmul> () from /home/cglagovich/tt-metal/build/lib/libtt_eager.so │·#15 0x00007fff88c2f1ba in tt::operations::primary::Matmul::create_output_tensors(std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor> > const&) const ()
from tt-metal.
2024-07-02 18:42:19.973 | DEBUG | ttnn.operations.core:from_torch_and_dump:739 - Generating cache for /proj_sw/llama3-data-cache/llama3_rot_mat_decode_129_multi_device_dtype_BFLOAT16_layout_TILE.bin of shape ttnn.Shape([1, 32, │·
128, 128]), dtype BFLOAT16, layout TILE │·
2024-07-02 18:42:19.977 | DEBUG | ttnn.operations.core:from_torch_and_dump:739 - Generating cache for /proj_sw/llama3-data-cache/llama3_attn_masks_decode_129_multi_device_dtype_BFLOAT16_layout_TILE.bin of shape ttnn.Shape([1, 1│·
, 32, 160]), dtype BFLOAT16, layout TILE │·
--Type <RET> for more, q to quit, c to continue without paging-- │·
│·
Thread 219 "python" received signal SIGSEGV, Segmentation fault. │·
[Switching to Thread 0x7ffe86fdf700 (LWP 838272)] │·
0x00007fff88834d03 in tt::tt_metal::allocator::FreeList::deallocate(unsigned long) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
(gdb) list │·
1 <built-in>: No such file or directory. │·
(gdb) bt │·
#0 0x00007fff88834d03 in tt::tt_metal::allocator::FreeList::deallocate(unsigned long) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
#1 0x00007fff88875f17 in tt::tt_metal::CommandQueue::run_command_impl(tt::tt_metal::CommandInterface const&) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
#2 0x00007fff88873b82 in tt::tt_metal::EnqueueDeallocateBuffer(tt::tt_metal::CommandQueue&, tt::tt_metal::Allocator&, unsigned int, tt::tt_metal::BufferType, bool) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
#3 0x00007fff89071d70 in std::__1::__function::__func<tt::tt_metal::Tensor::deallocate(bool)::$_0::operator()<tt::tt_metal::MultiDeviceStorage>(tt::tt_metal::MultiDeviceStorage&) const::{lambda(tt::tt_metal::Device*)#1}, std::__1│·
::allocator<{lambda(tt::tt_metal::Device*)#1}>, void (tt::tt_metal::Device*)>::operator()(tt::tt_metal::Device*&&) () from /home/cglagovich/tt-metal/build/lib/libtt_eager.so │·
#4 0x00007fff8907200f in std::__1::__function::__func<tt::tt_metal::Tensor::deallocate(bool)::$_0::operator()<tt::tt_metal::MultiDeviceStorage>(tt::tt_metal::MultiDeviceStorage&) const::{lambda()#1}, std::__1::allocator<{lambda()│·
#1}>, void ()>::operator()() () from /home/cglagovich/tt-metal/build/lib/libtt_eager.so │·
#5 0x00007fff8881429b in tt::WorkExecutor::run_worker() () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
#6 0x00007fff8881450b in void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (tt::WorkExecutor::*)(), tt::WorkEx│·
ecutor*> >(void*) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
#7 0x00007ffff7db5609 in start_thread (arg=<optimized out>) at pthread_create.c:477 │·
#8 0x00007ffff7eef353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
from tt-metal.
2024-07-02 18:57:17.853 | DEBUG | ttnn.operations.core:from_torch_and_dump:739 - Generating cache for /proj_sw/llama3-data-cache/llama3_attn_masks_decode_149_multi_device_dtype_BFLOAT16_layout_TILE.bin of shape ttnn.Shape([1, 1│·
, 32, 160]), dtype BFLOAT16, layout TILE │·
--Type <RET> for more, q to quit, c to continue without paging-- │·
│·
Thread 219 "python" received signal SIGSEGV, Segmentation fault. │·
[Switching to Thread 0x7ffe86fdf700 (LWP 857393)] │·
\0x00007fff88833d13 in tt::tt_metal::allocator::FreeList::search_first(unsigned long, bool) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
(gdb) bt │·
#0 0x00007fff88833d13 in tt::tt_metal::allocator::FreeList::search_first(unsigned long, bool) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
#1 0x00007fff888347a9 in tt::tt_metal::allocator::FreeList::allocate(unsigned long, bool, unsigned long) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·#2 0x00007fff888378d5 in tt::tt_metal::allocator::BankManager::allocate_buffer(unsigned int, unsigned int, bool, tt::umd::xy_pair, std::__1::optional<unsigned int>) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
#3 0x00007fff88839777 in tt::tt_metal::allocator::base_alloc(tt::tt_metal::AllocatorConfig const&, tt::tt_metal::allocator::BankManager&, unsigned long, unsigned long, bool, std::__1::optional<unsigned int>) () │· from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
#4 0x00007fff888398bf in tt::tt_metal::allocator::allocate_buffer(tt::tt_metal::Allocator&, unsigned int, unsigned int, tt::tt_metal::BufferType const&, bool, std::__1::optional<unsigned int>) () │·
from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·#5 0x00007fff888741ee in tt::tt_metal::EnqueueAllocateBufferImpl(tt::tt_metal::AllocBufferMetadata) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
#6 0x00007fff88876924 in tt::tt_metal::CommandQueue::run_command_impl(tt::tt_metal::CommandInterface const&) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
#7 0x00007fff888742e5 in tt::tt_metal::EnqueueAllocateBuffer(tt::tt_metal::CommandQueue&, tt::tt_metal::Buffer*, bool, bool) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
#8 0x00007fff88822c73 in tt::tt_metal::Buffer::allocate() () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
#9 0x00007fff888215e8 in tt::tt_metal::Buffer::Buffer(tt::tt_metal::Device*, unsigned long, unsigned long, tt::tt_metal::BufferType, tt::tt_metal::TensorMemoryLayout, std::__1::optional<tt::tt_metal::ShardSpecBuffer> const&, bool│·
) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so │·
#10 0x00007fff8905c6a8 in std::__1::shared_ptr<tt::tt_metal::Buffer> std::__1::allocate_shared[abi:ue170006]<tt::tt_metal::Buffer, std::__1::allocator<tt::tt_metal::Buffer>, tt::tt_metal::Device*&, unsigned int&, unsigned int&, tt│·::tt_metal::BufferType const&, void>(std::__1::allocator<tt::tt_metal::Buffer> const&, tt::tt_metal::Device*&, unsigned int&, unsigned int&, tt::tt_metal::BufferType const&) () │· from /home/cglagovich/tt-metal/build/lib/libtt_eager.so │·#11 0x00007fff88fa6e4a in tt::tt_metal::tensor_impl::allocate_buffer_on_device(unsigned int, tt::tt_metal::Device*, tt::tt_metal::Shape const&, tt::tt_metal::DataType, tt::tt_metal::Layout, tt::tt_metal::MemoryConfig const&, std::│·
__1::optional<tt::tt_metal::ShardSpecBuffer> const&) () from /home/cglagovich/tt-metal/build/lib/libtt_eager.so │·#12 0x00007fff8906e84a in tt::tt_metal::create_device_tensor(tt::tt_metal::Shape const&, tt::tt_metal::DataType, tt::tt_metal::Layout, tt::tt_metal::Device*, tt::tt_metal::MemoryConfig const&) () │· from /home/cglagovich/tt-metal/build/lib/libtt_eager.so │·#13 0x00007fff88c37418 in tt::tt_metal::operation::generic_create_output_tensors<tt::operations::primary::Matmul> () from /home/cglagovich/tt-metal/build/lib/libtt_eager.so │·
#14 0x00007fff88c2f1ba in tt::operations::primary::Matmul::create_output_tensors(std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor> > const&) const ()
from tt-metal.
I was not able to repro this segfault with async queues disabled.
In one of the deallocate
segfaults in a worker thread, I see that the main thread is involved in sending a tensor to device.
This made me wonder if this code pattern is the culprit:
rot_mats = ttnn.as_tensor(
rot_mat,
dtype=ttnn.bfloat16,
layout=ttnn.TILE_LAYOUT,
device=self.device_mesh,
cache_file_name=cache_name(f"rot_mat_decode_{start_pos}"),
memory_config=self.model_config["DRAM_MEMCFG"],
mesh_mapper=ReplicateTensorToMesh(self.device_mesh),
)
rot_mats = ttnn.to_device(rot_mats, self.device_mesh)
The to_device
should be unnecessary but not incorrect. I ran the test again with this call removed, but the segfaults did not go away.
from tt-metal.
Repro instructions:
- branch: cglagovich/9837
- Build in release mode
- set CPU frequency governor to
ondemand
(seems to help with repro)
gdb --args python -m pytest -svv models/demos/t3000/llama3_70b/demo/demo.py::test_LlamaModel_demo[wormhole_b0-True-check_disabled-greedy-tt-70b-T3000-80L-decode_only-text_completion-llama3]
Expected output:
--Type <RET> for more, q to quit, c to continue without paging--
Thread 224 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffe827d6700 (LWP 369024)]
0x00007fff88834d03 in tt::tt_metal::allocator::FreeList::deallocate(unsigned long) () from /home/cglagovich/tt-metal/build/lib/libtt_metal.so
(gdb) q
from tt-metal.
On a new T3000 machine to get the first run to 2816 tokens generated in a single sequence I got 6 crashes:
![2024-07-03-Kuaishou-ttsmi](https://private-user-images.githubusercontent.com/131909505/345622733-f9b65c7a-a8c7-4bb8-8159-07866decf29c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEzOTcwNDQsIm5iZiI6MTcyMTM5Njc0NCwicGF0aCI6Ii8xMzE5MDk1MDUvMzQ1NjIyNzMzLWY5YjY1YzdhLWE4YzctNGJiOC04MTU5LTA3ODY2ZGVjZjI5Yy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzE5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcxOVQxMzQ1NDRaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1jMzg1NmYwMWMwM2M2YTA1MjZlNjg3ZjQ3MmY1YzZmNjUyODM0NzIxZjU1YzQyNmMwMmZlYjM0YmYzZGMyY2VjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.g4qY5leFTtbeOEipPOFx63RgGL0ozcfxtqrgXfF7Iu8)
- OS: Ubuntu 20.04
- tt-kmd: 1.27.1
- firmward bundle: 80.8.12.0
- tt-metal commit: a053bc8
I did a soft reset tt-smi -r 0,1,2,3
after each crash and reran the first run script.
crash 1:
(python_env) user@66a27c372dce:~/tt-metal-llama3-70b/src$ python tt_metal_impl/demo/demo_llama3_first_run_4k.py
...
2024-07-03 11:26:09.820 | INFO | __main__:run_decode:199 - Loop 5
free(): invalid pointer
Aborted (core dumped)
2:
(python_env) user@66a27c372dce:~/tt-metal-llama3-70b/src$ python tt_metal_impl/demo/demo_llama3_first_run_4k.py
...
2024-07-03 11:33:32.714 | INFO | __main__:run_decode:199 - Loop 88
Always | FATAL | Out of Memory: Not enough space to allocate 1048576 B DRAM buffer across 12 banks, where each bank needs to store 88064 B
libc++abi: terminating due to uncaught exception of type std::runtime_error: TT_THROW @ ../tt_metal/impl/allocator/allocator.cpp:141: tt::exception
info:
Out of Memory: Not enough space to allocate 1048576 B DRAM buffer across 12 banks, where each bank needs to store 88064 B
backtrace:
--- tt::tt_metal::allocator::BankManager::allocate_buffer(unsigned int, unsigned int, bool, tt::umd::xy_pair, std::__1::optional<unsigned int>)
--- tt::tt_metal::allocator::base_alloc(tt::tt_metal::AllocatorConfig const&, tt::tt_metal::allocator::BankManager&, unsigned long, unsigned long, bool, std::__1::optional<unsigned int>)
--- tt::tt_metal::allocator::allocate_buffer(tt::tt_metal::Allocator&, unsigned int, unsigned int, tt::tt_metal::BufferType const&, bool, std::__1::optional<unsigned int>)
--- tt::tt_metal::EnqueueAllocateBufferImpl(tt::tt_metal::AllocBufferMetadata)
--- tt::tt_metal::CommandQueue::run_command_impl(tt::tt_metal::CommandInterface const&)
--- tt::tt_metal::EnqueueAllocateBuffer(tt::tt_metal::CommandQueue&, tt::tt_metal::Buffer*, bool, bool)
--- tt::tt_metal::Buffer::allocate()
--- tt::tt_metal::Buffer::Buffer(tt::tt_metal::Device*, unsigned long, unsigned long, tt::tt_metal::BufferType, tt::tt_metal::TensorMemoryLayout, std::__1::optional<tt::tt_metal::ShardSpecBuffer> const&, bool)
--- /tt-metal/build/lib/libtt_eager.so(+0x648958) [0x7f063a443958]
--- tt::tt_metal::tensor_impl::allocate_buffer_on_device(unsigned int, tt::tt_metal::Device*, tt::tt_metal::Shape const&, tt::tt_metal::DataType, tt::tt_metal::Layout, tt::tt_metal::MemoryConfig const&, std::__1::op
tional<tt::tt_metal::ShardSpecBuffer> const&)
--- tt::tt_metal::create_device_tensor(tt::tt_metal::Shape const&, tt::tt_metal::DataType, tt::tt_metal::Layout, tt::tt_metal::Device*, tt::tt_metal::MemoryConfig const&)
--- /tt-metal/build/lib/libtt_eager.so(_ZN2tt8tt_metal9operation29generic_create_output_tensorsINS_10operations7primary6MatmulEEENS1_21program_output_helperIT_Xsr18has_create_programIS7_EE5valueEE4typeERKS7_RKNSt3__
16vectorINS0_6TensorENSC_9allocatorISE_EEEENSC_8optionalINS0_8DataTypeEEENS0_6LayoutERKNSK_INS0_12MemoryConfigEEE+0x178) [0x7f063a047928]
--- tt::operations::primary::Matmul::create_output_tensors(std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> const&) const
--- /tt-metal/build/lib/libtt_eager.so(+0x1ee299) [0x7f0639fe9299]
--- std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> tt::tt_metal::operation::detail::run_device_operation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Te
nsor>>>(std::__1::reference_wrapper<tt::tt_metal::CommandQueue>, tt::tt_metal::operation::DeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>> const&, std::__1::vector<tt
::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor const>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor const>>> const&, std
::__1::vector<std::__1::optional<tt::tt_metal::Tensor>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor>>> const&)
--- std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> tt::tt_metal::operation::run<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>(tt::tt_metal::ope
ration::DeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>> const&, std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> const&, std::__1::v
ector<std::__1::optional<tt::tt_metal::Tensor const>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor const>>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor>, std::__1::allocator<std::_
_1::optional<tt::tt_metal::Tensor>>> const&, unsigned char)
--- /tt-metal/build/lib/libtt_eager.so(+0x1ed88f) [0x7f0639fe888f]
--- /tt-metal/build/lib/libtt_eager.so(+0x1ecf7d) [0x7f0639fe7f7d]
--- tt::tt_metal::operation::launch_op(std::__1::function<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> (std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::T
ensor>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor const>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor const>>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor>,
std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor>>> const&)>&&, std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>, std::__1::vector<tt::tt_metal::Tensor, std::__1::allocato
r<tt::tt_metal::Tensor>>&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor const>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor const>>>, std::__1::vector<std::__1::optional<tt::tt_metal::Tens
or>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor>>>, bool)
--- /tt-metal/build/lib/libtt_eager.so(+0x1eb994) [0x7f0639fe6994]
--- /tt-metal/build/lib/libtt_eager.so(+0x24b709) [0x7f063a046709]
--- /tt-metal/build/lib/libtt_eager.so(+0x516e39) [0x7f063a311e39]
--- /tt-metal/build/lib/libtt_eager.so(+0x517b6f) [0x7f063a312b6f]
--- /tt-metal/build/lib/libtt_metal.so(+0x1579eb) [0x7f0639c319eb]
--- /tt-metal/build/lib/libtt_metal.so(+0x157c5b) [0x7f0639c31c5b]
--- /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f06a66c0609]
--- /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f06a67fa353]
Aborted (core dumped)
crash 3 (same stack trace as above):
2024-07-03 11:40:37.964 | INFO | __main__:run_decode:199 - Loop 127
Always | FATAL | Out of Memory: Not enough space to allocate 1048576 B DRAM buffer across 12 banks, where each bank needs to store 88064 B
libc++abi: terminating due to uncaught exception of type std::runtime_error: TT_THROW @ ../tt_metal/impl/allocator/allocator.cpp:141: tt::exception
crash 4 (same stack trace as above):
2024-07-03 11:49:00.270 | INFO | __main__:run_decode:199 - Loop 489
Always | FATAL | Out of Memory: Not enough space to allocate 1048576 B DRAM buffer across 12 banks, where each bank needs to store 88064 B
libc++abi: terminating due to uncaught exception of type std::runtime_error: TT_THROW @ ../tt_metal/impl/allocator/allocator.cpp:141: tt::exception
crash 5:
2024-07-03 11:57:17.461 | INFO | __main__:run_decode:199 - Loop 847
2024-07-03 11:57:17.573 | INFO | __main__:run_decode:199 - Loop 848
Segmentation fault (core dumped)
crash 6 (hang)
2024-07-03 12:13:12.789 | INFO | __main__:run_decode:199 - Loop 913
2024-07-03 12:13:12.903 | INFO | __main__:run_decode:199 - Loop 914
^C^C^C^C^CTerminated
Rerunning after this crash got to 2816 tokens and gets to the known issue #9839. This completes the first run and generation for 2k context is relatively reliable.
from tt-metal.
Hey @tstescoTT would you mind running with this commit cherry-picked: 4558673. It resolved the segfault for me locally
from tt-metal.
@tstescoTT - can you help repro and confirm?
from tt-metal.
@mbahnasTT confirms tested - can be closed
from tt-metal.
Related Issues (20)
- tt_lib.fallback and torch ops are not listed in perf sheet
- VGG:Enable TTNN Support
- TTNN slice op missing documentation
- Improved ttnn Tensor Layout
- Collect requirements HOT 1
- Models common RMS Norm doesn't support interleaved HOT 1
- Reduce duplication between Documentation publishing to central site and regular documentation for every commit to main
- Explore better way of building external docs
- Klassify model unit test
- Track and report number of retrains for each Eth Link during a test HOT 3
- Port over missing kernel variants from tt_eager binary to ttnn binary
- [Blackhole Bringup] Typecast failures
- New Op: All Gather Matmul HOT 1
- [llama] Add llama3 perplexity test to T3K perplexity pipeline
- Move tt_eager folder to ttnn/experimental
- Ubuntu 22.04 upgrade HOT 1
- [Bug Report] Cannot compile the latest `main` branch on GCC (control flow and designated initializer errors)
- [Feature Request] Support ttnn `moreh_layernorm`
- Implement ttnn.where
- Add sweeps and API calls for ttnn_unary_le_bw
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tt-metal.